How to Detect Pedestrians in Images and Video Using OpenCV

In this tutorial, we will write a program to detect pedestrians in a photo and a video using a technique called the Histogram of Oriented Gradients (HOG). We will use the OpenCV computer vision library, which has a built-in pedestrian detection method that is based on the original research paper on HOG

I won’t go into the details and the math behind HOG (you don’t need to know the math in order to implement the algorithm), but if you’re interested in learning about what goes on under the hood, check out that research paper.

pedestrians_2_detect

Real-World Applications

  • Self-Driving Cars

Prerequisites

Install OpenCV

The first thing you need to do is to make sure you have OpenCV installed on your machine. If you’re using Anaconda, you can type this command into the terminal:

conda install -c conda-forge opencv

Alternatively, you can type:

pip install opencv-python

Detect Pedestrians in an Image

Get an image that contains pedestrians and put it inside a directory somewhere on your computer. 

Here are a couple images that I will use as test cases:

pedestrians_1
pedestrians_2

Inside that same directory, write the following code. I will save my file as detect_pedestrians_hog.py:

import cv2 # Import the OpenCV library to enable computer vision

# Author: Addison Sears-Collins
# https://automaticaddison.com
# Description: Detect pedestrians in an image using the 
#   Histogram of Oriented Gradients (HOG) method

# Make sure the image file is in the same directory as your code
filename = 'pedestrians_1.jpg'

def main():

  # Create a HOGDescriptor object
  hog = cv2.HOGDescriptor()
	
  # Initialize the People Detector
  hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
	
  # Load an image
  image = cv2.imread(filename)
		
  # Detect people
  # image: Source image
  # winStride: step size in x and y direction of the sliding window
  # padding: no. of pixels in x and y direction for padding of sliding window
  # scale: Detection window size increase coefficient	
  # bounding_boxes: Location of detected people
  # weights: Weight scores of detected people
  (bounding_boxes, weights) = hog.detectMultiScale(image, 
                                                   winStride=(4, 4),
                                                   padding=(8, 8), 
                                                   scale=1.05)

  # Draw bounding boxes on the image
  for (x, y, w, h) in bounding_boxes: 
    cv2.rectangle(image, 
                  (x, y),  
                  (x + w, y + h),  
                  (0, 0, 255), 
                   4)
					
  # Create the output file name by removing the '.jpg' part
  size = len(filename)
  new_filename = filename[:size - 4]
  new_filename = new_filename + '_detect.jpg'
	
  # Save the new image in the working directory
  cv2.imwrite(new_filename, image)
	
  # Display the image 
  cv2.imshow("Image", image) 
	
  # Display the window until any key is pressed
  cv2.waitKey(0) 
	
  # Close all windows
  cv2.destroyAllWindows() 

main()

Run the code.

python detect_pedestrians_hog.py

Image Results

Here is the output for the first image:

pedestrians_1_detect-1

Here is the output for the second image:

pedestrians_2_detect-1

Detect Pedestrians in a Video

Now that we know how to detect pedestrians in an image, let’s detect pedestrians in a video.

Find a video file that has pedestrians in it. You can check free video sites like Pixabay if you don’t have any videos.

The video should have dimensions of 1920 x 1080 for this implementation. If you have a video of another size, you will have to tweak the parameters in the code.

Place the video inside a directory.

Now let’s download a library that will apply a fancy mathematical technique called non-maxima suppression to take multiple overlapping bounding boxes and compress them into just one bounding box.

pip install --upgrade imutils

Also, make sure you have NumPy installed, a scientific computing library for Python.

If you’re using Anaconda, you can type:

conda install numpy

Alternatively, you can type:

pip install numpy

Inside that same directory, write the following code. I will save my file as detect_pedestrians_video_hog.py. We will save the output as an .mp4 video file:

# Author: Addison Sears-Collins
# https://automaticaddison.com
# Description: Detect pedestrians in a video using the 
#   Histogram of Oriented Gradients (HOG) method

import cv2 # Import the OpenCV library to enable computer vision
import numpy as np # Import the NumPy scientific computing library
from imutils.object_detection import non_max_suppression # Handle overlapping

# Make sure the video file is in the same directory as your code
filename = 'pedestrians_on_street_1.mp4'
file_size = (1920,1080) # Assumes 1920x1080 mp4
scale_ratio = 1 # Option to scale to fraction of original size. 

# We want to save the output to a video file
output_filename = 'pedestrians_on_street.mp4'
output_frames_per_second = 20.0 

def main():

  # Create a HOGDescriptor object
  hog = cv2.HOGDescriptor()
	
  # Initialize the People Detector
  hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
	
  # Load a video
  cap = cv2.VideoCapture(filename)

  # Create a VideoWriter object so we can save the video output
  fourcc = cv2.VideoWriter_fourcc(*'mp4v')
  result = cv2.VideoWriter(output_filename,  
                           fourcc, 
                           output_frames_per_second, 
                           file_size) 
	
  # Process the video
  while cap.isOpened():
		
    # Capture one frame at a time
    success, frame = cap.read() 
		
    # Do we have a video frame? If true, proceed.
    if success:
		
    	# Resize the frame
      width = int(frame.shape[1] * scale_ratio)
      height = int(frame.shape[0] * scale_ratio)
      frame = cv2.resize(frame, (width, height))
			
      # Store the original frame
      orig_frame = frame.copy()
			
      # Detect people
      # image: a single frame from the video
      # winStride: step size in x and y direction of the sliding window
      # padding: no. of pixels in x and y direction for padding of 
      #	sliding window
      # scale: Detection window size increase coefficient	
      # bounding_boxes: Location of detected people
      # weights: Weight scores of detected people
      # Tweak these parameters for better results
      (bounding_boxes, weights) = hog.detectMultiScale(frame, 
                                                       winStride=(16, 16),
                                                       padding=(4, 4), 
                                                       scale=1.05)

      # Draw bounding boxes on the frame
      for (x, y, w, h) in bounding_boxes: 
            cv2.rectangle(orig_frame, 
            (x, y),  
            (x + w, y + h),  
            (0, 0, 255), 
             2)
						
      # Get rid of overlapping bounding boxes
      # You can tweak the overlapThresh value for better results
      bounding_boxes = np.array([[x, y, x + w, y + h] for (
                                x, y, w, h) in bounding_boxes])
			
      selection = non_max_suppression(bounding_boxes, 
                                      probs=None, 
                                      overlapThresh=0.45)
		
      # draw the final bounding boxes
      for (x1, y1, x2, y2) in selection:
        cv2.rectangle(frame, 
                     (x1, y1), 
                     (x2, y2), 
                     (0, 255, 0), 
                      4)
		
      # Write the frame to the output video file
      result.write(frame)
			
      # Display the frame 
      cv2.imshow("Frame", frame) 	

      # Display frame for X milliseconds and check if q key is pressed
      # q == quit
      if cv2.waitKey(25) & 0xFF == ord('q'):
        break
		
    # No more video frames left
    else:
      break
			
  # Stop when the video is finished
  cap.release()
	
  # Release the video recording
  result.release()
	
  # Close all windows
  cv2.destroyAllWindows() 

main()

Run the code.

python detect_pedestrians_hog.py

Video Results

Here is a video of the output:

How To Convert a Quaternion Into Euler Angles in Python

Given a quaternion of the form  (x, y, z, w) where w is the scalar (real) part and x, y, and z are the vector parts, how do we convert this quaternion into the three Euler angles:

  • Rotation about the x axis = roll angle = α
  • Rotation about the y-axis = pitch angle = β
  • Rotation about the z-axis = yaw angle = γ
yaw_pitch_rollJPG

Doing this operation is important because ROS2 (and ROS) uses quaternions as the default representation for the orientation of a robot in 3D space. Roll, pitch, and yaw angles are a lot easier to understand and visualize than quaternions.

Here is the Python code:

import math

def euler_from_quaternion(x, y, z, w):
		"""
		Convert a quaternion into euler angles (roll, pitch, yaw)
		roll is rotation around x in radians (counterclockwise)
		pitch is rotation around y in radians (counterclockwise)
		yaw is rotation around z in radians (counterclockwise)
		"""
		t0 = +2.0 * (w * x + y * z)
		t1 = +1.0 - 2.0 * (x * x + y * y)
		roll_x = math.atan2(t0, t1)
    
		t2 = +2.0 * (w * y - z * x)
		t2 = +1.0 if t2 > +1.0 else t2
		t2 = -1.0 if t2 < -1.0 else t2
		pitch_y = math.asin(t2)
    
		t3 = +2.0 * (w * z + x * y)
		t4 = +1.0 - 2.0 * (y * y + z * z)
		yaw_z = math.atan2(t3, t4)
    
		return roll_x, pitch_y, yaw_z # in radians

Example

Suppose a robot is on a flat surface. It has the following quaternion:

Quaternion [x,y,z,w] = [0, 0, 0.7072, 0.7072]

What is the robot’s orientation in Euler Angle representation in radians?

quaternion_to_euler_1JPG

The program shows that the roll, pitch, and yaw angles in radians are (0.0, 0.0, 1.5710599372799763).

quaternion_to_euler_2

Which is the same as:

Euler Angle (roll, pitch, yaw) = (0.0, 0.0, π/2)

And in Axis-Angle Representation, the angle is:

Axis-Angle {[x, y, z], angle} = { [ 0, 0, 1 ], 1.571 }

So we see that the robot is rotated π/2 radians (90 degrees) around the z axis (going counterclockwise). 

And that’s all there is to it folks. That’s how you convert a quaternion into Euler angles.

You can use the code in this tutorial for your work in ROS2 since, as of this writing, the tf.transformations.euler_from_quaternion method isn’t available for ROS2 yet. 

Predict Vehicle Fuel Economy Using a Deep Neural Network

In this tutorial, we will use Tensorflow 2.0 with Keras to build a deep neural network that will enable us to predict a vehicle’s fuel economy (in miles per gallon) from eight different attributes: 

  1. Cylinders
  2. Displacement
  3. Horsepower
  4. Weight
  5. Acceleration 
  6. Model year 
  7. Origin
  8. Car name

We will use the Auto MPG Data Set at the UCI Machine Learning Repository.

Prerequisites

  • You have TensorFlow 2 Installed.
    • Windows 10 Users, see this post.
    • If you want to use GPU support for your TensorFlow installation, you will need to follow these steps. If you have trouble following those steps, you can follow these steps (note that the steps change quite frequently, but the overall process remains relatively the same).

Directions

Open up a new Python program (in your favorite text editor or Python IDE) and write the following code. I’m going to name the program vehicle_fuel_economy.py. I’ll explain the code later in this tutorial.

# Project: Predict Vehicle Fuel Economy Using a Deep Neural Network
# Author: Addison Sears-Collins
# Date created: November 3, 2020

import pandas as pd # Used for data analysis
import pathlib # An object-oriented interface to the filesystem
import matplotlib.pyplot as plt # Handles the creation of plots
import seaborn as sns # Data visualization library
import tensorflow as tf # Machine learning library
from tensorflow import keras # Library for neural networks
from tensorflow.keras import layers # Handles the layers of the neural network

def main():

  # Set the data path for the Auto-Mpg data set from the UCI Machine Learning Repository
  datasetPath = keras.utils.get_file("auto-mpg.data", "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")

  # Set the column names for the data set
  columnNames = ['MPG', 'Cylinders','Displacement','Horsepower','Weight',
               'Acceleration','Model Year','Origin']

  # Import the data set
  originalData = pd.read_csv(datasetPath, names=columnNames, na_values = "?", 
                           comment='\t', sep=" ", skipinitialspace=True)
						 
  # Check the data set
  # print("Original Data Set Excerpt")
  # print(originalData.head())
  # print()

  # Generate a copy of the data set
  data = originalData.copy()

  # Count how many NAs each data attribute has
  # print("Number of NAs in the data set")
  # print(data.isna().sum())
  # print()

  # Now, let's remove the NAs from the data set
  data = data.dropna()

  # Perform one-hot encoding on the Origin attribute 
  # since it is a categorical variable
  origin = data.pop('Origin') # Return item and drop from frame
  data['USA'] = (origin == 1) * 1.0
  data['Europe'] = (origin == 2) * 1.0
  data['Japan'] = (origin == 3) * 1.0

  # Generate a training data set (80% of the data) and a testing set (20% of the data)
  trainingData = data.sample(frac = 0.8, random_state = 0)

  # Generate a testing data set
  testingData = data.drop(trainingData.index)

  # Separate the attributes from the label in both the testing
  # and training data. The label is the thing we are trying
  # to predit (i.e. miles per gallon 'MPG')
  trainingLabelData = trainingData.pop('MPG')
  testingLabelData = testingData.pop('MPG')
  
  # Normalize the data
  normalizedTrainingData = normalize(trainingData)
  normalizedTestingData = normalize(testingData)
  #print(normalizedTrainingData.head()) 
  
  # Generate the neural network
  neuralNet = generateNeuralNetwork(trainingData)
  
  # See a summary of the neural network
  # The first layer has 640 parameters 
    #(9 input values * 64 neurons) + 64 bias values
  # The second layer has 4160 parameters 
    #(64 input values * 64 neurons) + 64 bias values
  # The output layer has 65 parameters 
    #(64 input values * 1 neuron) + 1 bias value
  #print(neuralNet.summary())
  
  EPOCHS = 1000
  
  # Train the model for a fixed number of epochs
  # history.history attribute is returned from the fit() function.
  # history.history is a record of training loss values and 
  # metrics values at successive epochs, as well as validation 
  # loss values and validation metrics values.
  history = neuralNet.fit(
    x = normalizedTrainingData, 
    y = trainingLabelData,
    epochs = EPOCHS, 
    validation_split = 0.2, 
    verbose = 0,
    callbacks = [PrintDot()]
  )   
  
  # Plot the neural network metrics (Training error and validation error)
  # Training error is the error when the trained neural network is 
  #   run on the training data.
  # Validation error is used to minimize overfitting. It indicates how
  #   well the data fits on data it hasn't been trained on.
  #plotNeuralNetMetrics(history)
  
  # Generate another neural network so that we can use early stopping
  neuralNet2 = generateNeuralNetwork(trainingData)
  
  # We want to stop training the model when the 
  # validation error stops improving.
  # monitor indicates the quantity we want to monitor.
  # patience indicates the number of epochs with no improvement after which
  # training will terminate.
  earlyStopping = keras.callbacks.EarlyStopping(monitor = 'val_loss', patience = 10)

  history2 = neuralNet2.fit(
    x = normalizedTrainingData, 
    y = trainingLabelData,
    epochs = EPOCHS, 
    validation_split = 0.2, 
    verbose = 0,
    callbacks = [earlyStopping, PrintDot()]
  )    

  # Plot metrics
  #plotNeuralNetMetrics(history2) 
  
  # Return the loss value and metrics values for the model in test mode
  # The mean absolute error for the predictions should 
  # stabilize around 2 miles per gallon  
  loss, meanAbsoluteError, meanSquaredError = neuralNet2.evaluate(
    x = normalizedTestingData,
	y = testingLabelData,
    verbose = 0
  )
  
  #print(f'\nMean Absolute Error on Test Data Set = {meanAbsoluteError} miles per gallon')
  
  # Make fuel economy predictions by deploying the trained neural network on the 
  # test data set (data that is brand new for the trained neural network).
  testingDataPredictions = neuralNet2.predict(normalizedTestingData).flatten()
  
  # Plot the predicted MPG vs. the true MPG
  # testingLabelData are the true MPG values
  # testingDataPredictions are the predicted MPG values
  #plotTestingDataPredictions(testingLabelData, testingDataPredictions)
  
  # Plot the prediction error distribution
  #plotPredictionError(testingLabelData, testingDataPredictions)
  
  # Save the neural network in Hierarchical Data Format version 5 (HDF5) format
  neuralNet2.save('fuel_economy_prediction_nnet.h5')
  
  # Import the saved model
  neuralNet3 = keras.models.load_model('fuel_economy_prediction_nnet.h5')
  print("\n\nNeural network has loaded successfully...\n")
  
  # Show neural network parameters
  print(neuralNet3.summary())
  
  # Make a prediction using the saved model we just imported
  print("\nMaking predictions...")
  testingDataPredictionsNN3 = neuralNet3.predict(normalizedTestingData).flatten()
  
  # Show Predicted MPG vs. Actual MPG
  plotTestingDataPredictions(testingLabelData, testingDataPredictionsNN3) 
  
# Generate the neural network
def generateNeuralNetwork(trainingData):
  # A Sequential model is a stack of layers where each layer is
  # single-input, single-output
  # This network below has 3 layers.
  neuralNet = keras.Sequential([
  
    # Each neuron in a layer recieves input from all the 
    # neurons in the previous layer (Densely connected)
    # Use the ReLU activation function. This function transforms the input
	# into a node (i.e. summed weighted input) into output	
    # The first layer needs to know the number of attributes (keys) in the data set.
	# This first and second layers have 64 nodes.
    layers.Dense(64, activation=tf.nn.relu, input_shape=[len(trainingData.keys())]),
    layers.Dense(64, activation=tf.nn.relu),
    layers.Dense(1) # This output layer is a single, continuous value (i.e. Miles per gallon)
  ])

  # Penalize the update of the neural network parameters that are causing
  # the cost function to have large oscillations by using a moving average
  # of the square of the gradients and dibiding the gradient by the root of this
  # average. Reduces the step size for large gradients and increases 
  # the step size for small gradients.
  # The input into this function is the learning rate.
  optimizer = keras.optimizers.RMSprop(0.001)
 
  # Set the configurations for the model to get it ready for training
  neuralNet.compile(loss = 'mean_squared_error',
                optimizer = optimizer,
                metrics = ['mean_absolute_error', 'mean_squared_error'])
  return neuralNet
    
# Normalize the data set using the mean and standard deviation 
def normalize(data):
  statistics = data.describe()
  statistics = statistics.transpose()
  return(data - statistics['mean']) / statistics['std']

# Plot metrics for the neural network  
def plotNeuralNetMetrics(history):
  neuralNetMetrics = pd.DataFrame(history.history)
  neuralNetMetrics['epoch'] = history.epoch
  
  plt.figure()
  plt.xlabel('Epoch')
  plt.ylabel('Mean Abs Error [MPG]')
  plt.plot(neuralNetMetrics['epoch'], 
           neuralNetMetrics['mean_absolute_error'],
           label='Train Error')
  plt.plot(neuralNetMetrics['epoch'], 
           neuralNetMetrics['val_mean_absolute_error'],
           label = 'Val Error')
  plt.ylim([0,5])
  plt.legend()
  
  plt.figure()
  plt.xlabel('Epoch')
  plt.ylabel('Mean Square Error [$MPG^2$]')
  plt.plot(neuralNetMetrics['epoch'], 
           neuralNetMetrics['mean_squared_error'],
           label='Train Error')
  plt.plot(neuralNetMetrics['epoch'], 
           neuralNetMetrics['val_mean_squared_error'],
           label = 'Val Error')
  plt.ylim([0,20])
  plt.legend()
  plt.show()
  
# Plot prediction error
def plotPredictionError(testingLabelData, testingDataPredictions):

  # Error = Predicted - Actual
  error = testingDataPredictions - testingLabelData
  
  plt.hist(error, bins = 50)
  plt.xlim([-10,10])
  plt.xlabel("Predicted MPG - Actual MPG")
  _ = plt.ylabel("Count")
  plt.show()

# Plot predictions vs. true values
def plotTestingDataPredictions(testingLabelData, testingDataPredictions):

  # Plot the data points (x, y)
  plt.scatter(testingLabelData, testingDataPredictions)
  
  # Label the axes
  plt.xlabel('True Values (Miles per gallon)')
  plt.ylabel('Predicted Values (Miles per gallon)')

  # Plot a line between (0,0) and (50,50) 
  point1 = [0, 0]
  point2 = [50, 50]
  xValues = [point1[0], point2[0]] 
  yValues = [point1[1], point2[1]]
  plt.plot(xValues, yValues) 
  
  # Set the x and y axes limits
  plt.xlim(0, 50)
  plt.ylim(0, 50)

  # x and y axes are equal in displayed dimensions
  plt.gca().set_aspect('equal', adjustable='box')
  
  # Show the plot
  plt.show()
  
  
# Show the training process by printing a period for each epoch that completes
class PrintDot(keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs):
    if epoch % 100 == 0: print('')
    print('.', end='')
	
main()

Save the Python program.

If you run your Python programs using Anaconda, open the Anaconda prompt.

If you like to run your programs in a virtual environment, activate the virtual environment. I have a virtual environment named tf_2.

conda activate tf_2

Navigate to the folder where you saved the Python program.

cd [path to folder]

For example,

cd C:\MyFiles

Install any libraries that you need. I didn’t have some of the libraries in the “import” section of my code installed, so I’ll install them now.

pip install pandas
pip install seaborn

To run the code, type:

python vehicle_fuel_economy.py

If you’re using a GPU with Tensorflow, and you’re getting error messages about libraries missing, go to this folder C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin, and you can search on the Internet for the missing dll files. Download them, and then put them in that bin folder.

Code Output

In this section, I will pull out snippets of the code and show you the resulting output when you uncomment those lines.

  # Check the data set
  print("Original Data Set Excerpt")
  print(originalData.head())
  print()
1_original_datasetJPG
  # Count how many NAs each data attribute has
  print("Number of NAs in the data set")
  print(data.isna().sum())
  print()
2-number-of-nasJPG
  # See a summary of the neural network
  # The first layer has 640 parameters 
    #(9 input values * 64 neurons) + 64 bias values
  # The second layer has 4160 parameters 
    #(64 input values * 64 neurons) + 64 bias values
  # The output layer has 65 parameters 
    #(64 input values * 1 neuron) + 1 bias value
  print(neuralNet.summary())

3-output-of-neural-net-summaryJPG
  # Plot the neural network metrics (Training error and validation error)
  # Training error is the error when the trained neural network is 
  #   run on the training data.
  # Validation error is used to minimize overfitting. It indicates how
  #   well the data fits on data it hasn't been trained on.
  plotNeuralNetMetrics(history)

4-mean-absolute-errorJPG
5-mean-squared-errorJPG
  # Plot metrics
  plotNeuralNetMetrics(history2) 
6-error-with-early-stoppingJPG
print(f'\nMean Absolute Error on Test Data Set = {meanAbsoluteError} miles per gallon') 
7-mae-test-data-setJPG
  # Plot the predicted MPG vs. the true MPG
  # testingLabelData are the true MPG values
  # testingDataPredictions are the predicted MPG values
  plotTestingDataPredictions(testingLabelData, testingDataPredictions)
8-predicted-vs-trueJPG
  # Plot the prediction error distribution
  plotPredictionError(testingLabelData, testingDataPredictions)
9-prediction-error-frequencyJPG
  # Save the neural network in Hierarchical Data Format version 5 (HDF5) format
  neuralNet2.save('fuel_economy_prediction_nnet.h5')
  
  # Import the saved model
  neuralNet3 = keras.models.load_model('fuel_economy_prediction_nnet.h5')
10-loading-and-saving-a-neural-networkJPG

References

Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.