How to Install Ubuntu and VirtualBox on a Windows PC

In this project, we will get started on our Robot Operating System (ROS) programming journey by installing Ubuntu. Ubuntu is a popular distribution (i.e. flavor) of the Linux operating system and is fully supported by ROS, the most popular framework for writing robotics software. If you have a Windows PC (I have Windows 10), I recommend you install a VirtualBox first and then install Ubuntu in the Virtual Box. I’ll show you how to do all that below.

The process for installing Ubuntu has a lot of steps, so hold on tight, don’t give up if something goes wrong, and go slowly so that you get your installation setup properly. Let’s get started!


Here are the project requirements:

  • Install Ubuntu
  • Install Virtual Box
  • Install Ubuntu on VirtualBox
  • Learn Important Linux Terminal Commands

You Will Need

The following components are used in this project. You will need:


Download the Ubuntu Image

Check Ubuntu Releases to find the latest version of Ubuntu that has long term support (LTS). As of this writing, the latest release is Ubuntu 18.04.3 LTS (Bionic Beaver).


Click on the latest release of Ubuntu, and download the 64-bit PC (AMD64) desktop image (.iso file).

Before installing Ubuntu, you need to install Virtual Box. Virtual Box extends the capabilities of your host computer (i.e. your laptop or desktop PC) by enabling you to install and run an operating system in a new environment on top of your current operating system (Windows 10 in my case). The environment the new operating system will run in is known as a virtual machine (or guest).

Install VirtualBox

Let’s download VirtualBox. Go to VirtualBox downloads.

Select the platform package for Windows hosts to download the executable (.exe) file.


Detailed installation instructions for all operating systems (Windows, Mac OS, Linux, and Solaris) can be found in the instruction manual. Let’s go through the steps.

Double-click on the executable file.


Click Next to begin the installation process.

Click Next to install the VirtualBox in the default location.


Click Next to choose the default features.


You will see a warning about network interfaces. Ignore it, and click Yes to proceed.


You are now ready to install VirtualBox. Click Install to proceed.


Click Yes to allow the software to make changes to your device.

Click Install again.


Click Finish to run VirtualBox.


You can optionally delete the original executable file for VirtualBox (the one with the .exe extension). You don’t need it anymore.

Create a Virtual Machine

Now that VirtualBox is installed on your computer, we need to now create a new virtual machine.

Click the New button in the toolbar.


Type in a descriptive name for your operating system. You can stick with the default machine folder. The machine folder is where your virtual machines will be stored.

Also, select the operating system that you want to later install (Linux in this case).


Click Next to proceed.

The default memory size for me is 1024 MB. That is not enough. Raise it to 6470 MB, and then click Next to proceed.


Make sure “Create a virtual hard disk now” is selected, and click Create.


Select “VirtualBox Disk Image (VDI)”, and click Next.

Choose a Fixed size virtual hard disk so that you have better performance, and click next.


You can stick with the default hard disk space (10GB as of the time of this writing) or go with something like 50 GB. I went with 50 GB. I also prefer to save my hard disk on my D drive (which has more space than my C drive). Then click Create.


Double-click on the left panel where it says “Ubuntu 18.04.” A startup window will appear.


Click the Folder icon next to Empty and select the Ubuntu image you downloaded earlier in this tutorial. It is a .iso file. You can make sure that your .iso file is somewhere in your C drive (doesn’t have to be on your Desktop). Then click Start to proceed.


You might get an error that looks like this.


Click Close VM.

Enable Virtualization Technology on Your Computer

The error above arises because virtualization technology is disabled on your computer by default. We need to enable it. Let’s do that now.

Go down to the search area on your computer in the bottom left of your screen, and searched for “Advanced Startup”.


Click “Change Advanced Startup Options.”

Click Restart Now.

Click Troubleshoot.

Click Advanced Options.

Click UEFI Firmware Settings.

Click Restart to change the UEFI Firmware Settings.

Click F10 BIOS Setup.

Press the right arrow to go to System Configuration.

Scroll down to Virtualization Technology.

Press Enter to select Enabled.

Press the down arrow and then Enter to select Enabled.

Press F10 to save and exit.

Press Enter on Yes to save the changes.

Your computer will reboot.

Double-click on the VirtualBox icon to start it.

Click on the left panel of the window to start the Ubuntu virtual machine. Or you can just click Start in the toolbar.


You should see the Ubuntu window appear.

Install Ubuntu

Click on “Install Ubuntu” to install Ubuntu.


Click “Continue” to save the keyboard layout. The default English one is fine.

Keep clicking Continue through all the prompts. The options you want selected as you go through the prompts are the following:

  • Download updates while installing Ubuntu
  • Erase disk and install Ubuntu

You will get to a point where you will need to set your time zone. It will be a big map of the world that should automatically detect your location.

Type in a computer name and pick a username and password. I select the “Log in automatically” option.

When installation is complete, click “Restart Now.”

When you reboot, you will go to the Ubuntu desktop.


If at any time you want to exit Ubuntu, go to File → Close. You will be given the option to save your machine state so that you can pick up where you left off the next time you login to Ubuntu.


Two additional notes….when you power up your VirtualBox, it is a good idea to go to Settings → Display and change your Video Memory to 128 MB. This will give you ample video memory.


Also go to Settings → System → Processor, and adjust the number of CPUs to 4.


If you at any point in the future run out of hard drive space for your virtual machine, you can move it to another drive (e.g. D drive) by following this article at Tech Republic.

Here are the complete settings on my Windows machine.


Getting Used to Ubuntu

Similar to the C drive in Windows 10, Linux has a file system (everything branches from the / symbol). You can find it by clicking the icon of the file cabinet on the right side of the desktop. It is the third icon down in the image below.


Click “Other Locations.”


Click “Computer.”

You will see all the files. For example, the path to the bin file is /bin 


You can find popular software applications for download in the Ubuntu Software module. It is the sixth icon in the left-hand panel. 


Linux Terminal Commands With the Ubuntu Command-Line Interface

We could certainly navigate around Ubuntu using the graphical user interface, but we would miss out on being able to run advanced processes for ROS. This is where the Terminal application comes in handy. The Terminal application is the Ubuntu command-line interface and is similar to the Command Prompt on a Windows 10 system. It enables us to use Linux terminal commands.

To open the Terminal, click the bottom left where you see those nine white dots and search for “Terminal” at the top. 


Click on Terminal.


If you do a Google search for “common Linux terminal commands,” you should find some nice cheat sheets to use as a reference. Let’s try a few common commands below.

The following command retrieves a list of all the files and folders in the current directory.


If you want to change to the desktop, type:

cd Desktop

If you want to get the path to the current directory, type:


To go up one directory, type:

cd ~

To update the list of packages, type:

sudo apt-get update

The sudo keyword enables you to run a command as an administrator.

At this stage, it would be useful for you to install htop, an interactive system-monitor process-viewer and process-manager. To install it, type the following command:

sudo apt-get install htop

If at any point you want to remove it, you can type the following command:

sudo apt-get remove htop

To run, htop, you type:


You can reboot the system using the following command:

sudo reboot

Finally, to shutdown the system, you type the following command:

sudo poweroff

How to Annotate Images Using OpenCV

In this project, we will learn how to annotate images using OpenCV — a popular and powerful open source library for image processing and computer vision. OpenCV is a cross-platform library with wrappers for Python, Ruby, C#, JavaScript, and other languages designed for real-time image processing. OpenCV has methods for image I/O, filtering, motion tracking, segmentation, 3D reconstruction, as well as machine learning techniques such as boosting, support vector machines, and deep learning.


  • Design a software application using Python and OpenCV that allows users to click in an image, annotate a number of points within an image, and export the annotated points into a CSV file.
    • Code must be implemented in Python and using OpenCV
    • Students should NOT use Jupyter Notebooks for this project
    • The input image and output CSV files will be provided as parameters.
      • Example: python cat_dog.jpg cat_dog.csv

You Will Need 

  • Python 3.7

Input Images



To run the program, open up an Anaconda Prompt terminal

Go to the proper directory.

Type python cat_dog.jpg cat_dog.csv to run the program.

Here is the code:

import cv2 # Import the OpenCV library
import numpy as np # Import Numpy library
import pandas as pd # Import Pandas library
import sys # Enables the passing of arguments

# Project: Annotate Images Using OpenCV
# Author: Addison Sears-Collins
# Date created: 9/11/2019
# Python version: 3.7
# Description: This program allows users to click in an image, annotate a 
#   number of points within an image, and export the annotated points into
#   a CSV file.

# Define the file name of the image
INPUT_IMAGE = sys.argv[1] # "cat_dog.jpg"
OUTPUT_IMAGE = IMAGE_NAME + "_annotated.jpg"
output_csv_file = sys.argv[2]

# Load the image and store into a variable
# -1 means load unchanged
image = cv2.imread(INPUT_IMAGE, -1)

# Create lists to store all x, y, and annotation values
x_vals = []
y_vals = []
annotation_vals = []

# Dictionary containing some colors
colors = {'blue': (255, 0, 0), 'green': (0, 255, 0), 'red': (0, 0, 255), 
          'yellow': (0, 255, 255),'magenta': (255, 0, 255), 
          'cyan': (255, 255, 0), 'white': (255, 255, 255), 'black': (0, 0, 0), 
          'gray': (125, 125, 125), 
          'rand': np.random.randint(0, high=256, size=(3,)).tolist(), 
          'dark_gray': (50, 50, 50), 'light_gray': (220, 220, 220)}

def draw_circle(event, x, y, flags, param):
    Draws dots on double clicking of the left mouse button
    # Store the height and width of the image
    height = image.shape[0]
    width = image.shape[1]

    if event == cv2.EVENT_LBUTTONDBLCLK:
        # Draw the dot, (x, y), 5, colors['magenta'], -1)

        # Annotate the image
        txt = input("Describe this pixel using one word (e.g. dog) and press ENTER: ")

        # Append values to the list

        # Print the coordinates and the annotation to the console
        print("x = " + str(x) + "  y = " + str(y) + "  Annotation = " + txt + "\n")

        # Set the position of the text part of the annotation
        text_x_pos = None
        text_y_pos = y

        if x < (width/2):
            text_x_pos = int(x + (width * 0.075))
            text_x_pos = int(x - (width * 0.075))
        # Write text on the image
        cv2.putText(image, txt, (text_x_pos,text_y_pos), cv2.FONT_HERSHEY_SIMPLEX, 1, colors['magenta'], 2)

        cv2.imwrite(OUTPUT_IMAGE, image)

        # Prompt user for another annotation
        print("Double click another pixel or press 'q' to quit...\n")

print("Welcome to the Image Annotation Program!\n")
print("Double click anywhere inside the image to annotate that point...\n")

# We create a named window where the mouse callback will be established
cv2.namedWindow('Image mouse')

# We set the mouse callback function to 'draw_circle':
cv2.setMouseCallback('Image mouse', draw_circle)

while True:
    # Show image 'Image mouse':
    cv2.imshow('Image mouse', image)

    # Continue until 'q' is pressed:
    if cv2.waitKey(20) &amp; 0xFF == ord('q'):

# Create a dictionary using lists
data = {'X':x_vals,'Y':y_vals,'Annotation':annotation_vals}

# Create the Pandas DataFrame
df = pd.DataFrame(data)

# Export the dataframe to a csv file
df.to_csv(path_or_buf = output_csv_file, index = None, header=True) 

# Destroy all generated windows:

Output Images


CSV Output

Here is the output for the csv file for the baby photo above:


How to Create an Image Histogram Using OpenCV

Given an image as input, how do we get the corresponding histogram using OpenCV? First, let us take a look at what a histogram is, then let us take a look at how to create one given an image. 

What is a Histogram?

A histogram is another way of looking at an image. It is a graph that shows pixel brightness values on the x-axis (e.g. 0 [black] to 255 [white] for grayscale images) and the corresponding number (i.e. frequency) of pixels (for each brightness value) on the y-axis. 

How to Create an Image Histogram Using OpenCV

There are two links I particularly like that show how to create the image histogram given an input image.

  1. Geeks for Geeks
  2. OpenCV Python Tutorials

I like these tutorials because they lead the reader through all the essentials of how to find and analyze image histograms, step-by-step. This process boils down to the following code:

# Import the required libraries
import cv2  # Open CV
from matplotlib import pyplot as plt  #Matplotlib for plotting  
# Read the input image
img = cv2.imread('example.jpg',0) 
# Calculate the frequency of pixels in the brightness range 0 - 255
histr = cv2.calcHist([img],[0],None,[256],[0,256]) 
# Plot the histogram and display

Why Use CMOS Instead of CCD Sensors in Mobile Phones

If I were designing a new state-of-the-art mobile phone, I would choose Complementary metal–oxide–semiconductor (CMOS). CMOS has several advantages over a Charge-coupled device (CCD), which I will explain below.

Processing Speed

In CCD, photosites are passive, whereas in CMOS they are not…leading to slower processing speed and information transfer.

A photosite is denoted as a single color pixel in a CCD or CMOS sensor. In a CCD sensor, light is captured and converted into a charge. The charge accumulates in the photosites, is transferred to a voltage converter, and is then amplified. This whole process happens one row at a time. This video below demonstrates this process. 

However, with a CMOS sensor, the charge to voltage conversion and the amplification of the voltage occurs inside each photosite. Because the work on an image happens locally, processing and information transfer is faster than a CCD sensor.

Space Requirements

CMOS enables integration of timers and analog-to-digital converters, which conserves space.

CCD is an older technology, and it is not possible to integrate peripheral components like analog-to-digital converters and timers on a single chip. CMOS does enable integration of these components onto a single chip, which conserves space. 

For a mobile phone that needs to be limited to a certain size, space must be conserved, which gives CMOS an advantage for use in mobile phones.

Power Consumption

CCD consumes more power than CMOS. CCD needs a variety of power supplies for the timing clocks. Also, it requires a voltage of 7V to 10V.

A CMOS sensor requires just one power supply and requires a voltage of 3.3V to 5V, roughly 50% less than a CCD sensor. This lower power consumption means extended battery life.

CMOS Prevents Blooming

In a CCD sensor, when an image is overexposed, electrons pile up in the areas of the brightest part of the image and overflow to other photosites, which creates unwanted light streaks. The structure of CMOS sensors prevents this problem.


CMOS chips can be produced on virtually any standard silicon production line, whereas this is not the case for CCD chips. As a result, production cost is lower for CMOS chips. These cost savings result in better profit margins for mobile phone companies.

How to Draw the Letter ‘E’ on an Image Using Scikit-Image


Develop a program in Python to draw an E at the center of an input image.

  • Program must be developed using Python 3.x.
  • Program must use scikit-image library — a simple and popular open source library for image processing in Python.
  • The input image must be a color image.
  • The letter must be at the center of the image and must be created by updating pixels, not by using any of the drawing functions.
  • The final output must be a side-by-side image created using matplotlib.
  • Must test the same code with two different images or two different sizes.

You Will Need 


Find any two images/photos. 

Create a new Jupyter Notebook. 

Here are the critical reference points for the letter E. These points mark the corners of the four rectangles that make up the letter E.


Here is the pdf of my Jupyter notebook.

Here is the raw code for the program in Python:

#!/usr/bin/env python
# coding: utf-8

# # Project 1 – Introduction to Python scikit-image
# ## Author
# Addison Sears-Collins
# ## Date Created
# 9/4/2019
# ## Python Version
# 3.7
# ## Description
# This program draws an E at the center of an input image.
# ## Purpose
# The purpose of this assignment is to introduce the basic functions of the Python scikit-image
# library -- a simple and popular open source library for image processing in Python. The scikitimage
# extends scipy.ndimage to provide a set of image processing routines including I/O, color
# and geometric transformations, segmentation, and other basic features.
# ## File Path

# In[1]:

# Move to the directory where the input images are located
get_ipython().run_line_magic('cd', 'D:\\Dropbox\\work')

# List the files in that directory
get_ipython().run_line_magic('ls', '')

# ## Code

# In[2]:

# Import scikit-image
import skimage

# Import module to read and write images in various formats
from skimage import io

# Import matplotlib functionality
import matplotlib.pyplot as plt

# Import numpy
import numpy as np

# Set the color of the E
# [red, green, blue]
COLOR_OF_E = [255, 0, 0]

# In[3]:

# Show the critical points of E
from IPython.display import Image
Image(filename = "e_critical_points.PNG", width = 200, height = 200)

# In[4]:

def e_generator(y_dim, x_dim):
    Generates the coordinates of the E
    :param y_dim int: The y dimensions of the input image
    :param x_dim int: The x dimensions of the input image
    :return: The critical coordinates
    :rtype: list
    # Set all the critical points
    A =  [int(0.407 * y_dim), int(0.423 *  x_dim)]
    B =  [int(0.407 * y_dim), int(0.589 *  x_dim)]
    C =  [int(0.488 * y_dim), int(0.423 *  x_dim)]
    D =  [int(0.488 * y_dim), int(0.589 *  x_dim)]
    E =  [int(0.572 * y_dim), int(0.423 *  x_dim)]
    F =  [int(0.572 * y_dim), int(0.581 *  x_dim)]
    G =  [int(0.657 * y_dim), int(0.423 *  x_dim)]
    H =  [int(0.657 * y_dim), int(0.581 *  x_dim)]
    I =  [int(0.735 * y_dim), int(0.423 *  x_dim)]
    J =  [int(0.735 * y_dim), int(0.589 *  x_dim)]
    K =  [int(0.819 * y_dim), int(0.423 *  x_dim)]
    L =  [int(0.819 * y_dim), int(0.589 *  x_dim)]
    M =  [int(0.407 * y_dim), int(0.47 *  x_dim)]
    N =  [int(0.819 * y_dim), int(0.47 *  x_dim)]
    return A,B,C,D,E,F,G,H,I,J,K,L,M,N

# In[5]:

def plot_image_with_e(image, A, B, C, D, E, F, G, H, I, J, K, L, M, N):
    Plots an E on an input image
    :param image: The input image
    :param A, B, etc. list: The coordinates of the critical points
    :return: image_with_e
    :rtype: image
    # Copy the image
    image_with_e = np.copy(image)

    # Top horizontal rectangle
    image_with_e[A[0]:C[0], A[1]:B[1], :] = COLOR_OF_E 

    # Middle horizontal rectangle
    image_with_e[E[0]:G[0], E[1]:F[1], :] = COLOR_OF_E

    # Bottom horizontal rectangle
    image_with_e[I[0]:K[0], I[1]:J[1], :] = COLOR_OF_E

    # Vertical connector rectangle
    image_with_e[A[0]:K[0], A[1]:M[1], :] = COLOR_OF_E

    # Display image

    return image_with_e

# In[6]:

def print_image_details(image):
    Prints the details of an input image
    :param image: The input image
    print("Size: ", image.size)
    print("Shape: ", image.shape)
    print("Type: ", image.dtype)
    print("Max: ", image.max())
    print("Min: ", image.min())

# In[7]:

def compare(original_image, annotated_image):
    Compare two images side-by-side
    :param original_image: The original input image
    :param annotated_image: The annotated-version of the original input image
    # Compare the two images side-by-side
    f, (ax0, ax1) = plt.subplots(1, 2, figsize=(20,10))

    ax0.set_title('Original', fontsize = 18)

    ax1.set_title('Annotated', fontsize = 18)

# In[8]:

# Load the test image
image = io.imread("test_image.jpg")

# Store the y and x dimensions of the input image
y_dimensions = image.shape[0]
x_dimensions = image.shape[1]

# Print the image details

# Display the image

# In[9]:

# Set all the critical points of the image
A,B,C,D,E,F,G,H,I,J,K,L,M,N = e_generator(y_dimensions, x_dimensions)

# Plot the image with E and store it
image_with_e = plot_image_with_e(image, A, B, C, D, E, F, G, H, I, J, K, L, M, N)

# Save the output image
plt.imsave('test_image_annotated.jpg', image_with_e)

# In[10]:

compare(image, image_with_e)

# In[11]:

# Load the first image
image = io.imread("architecture_roof_buildings_baked.jpg")

# Store the y and x dimensions of the input image
y_dimensions = image.shape[0]
x_dimensions = image.shape[1]

# Print the image details

# Display the image

# In[12]:

# Set all the critical points of the image
A,B,C,D,E,F,G,H,I,J,K,L,M,N = e_generator(y_dimensions, x_dimensions)

# Plot the image with E and store it
image_with_e = plot_image_with_e(image, A, B, C, D, E, F, G, H, I, J, K, L, M, N)

# Save the output image
plt.imsave('architecture_roof_buildings_baked_annotated.jpg', image_with_e)

# In[13]:

compare(image, image_with_e)

# In[14]:

# Load the second image
image = io.imread("statue.jpg")

# Store the y and x dimensions of the input image
y_dimensions = image.shape[0]
x_dimensions = image.shape[1]

# Print the image details

# Display the image

# In[15]:

# Set all the critical points of the image
A,B,C,D,E,F,G,H,I,J,K,L,M,N = e_generator(y_dimensions, x_dimensions)

# Plot the image with E and store it
image_with_e = plot_image_with_e(image, A, B, C, D, E, F, G, H, I, J, K, L, M, N)

# Save the output image
plt.imsave('statue_annotated.jpg', image_with_e)

# In[16]:

compare(image, image_with_e)

# In[ ]:






Biometric Fingerprint Scanner | Image Processing Applications

In this post, I will discuss an application that relies heavily on image processing.

Biometric Fingerprint Scanner


I am currently in an apartment complex that has no doorman. Instead, in order to enter the property, you have to scan your fingerprint on the fingerprint scanner next to the entry gate on the main walkway that enters the complex.

How It Works

In order to use the fingerprint scanner, I had to go to the administration for the apartment complex and have all of my fingerprints scanned. The receptionist took my hand and individually scanned each finger on both hands until clear digital images were produced of all my fingers. There were a number of times when the receptionist tried to scan one of my fingers, and the fingerprints failed to be read by the scanner. In these situations, I had to rotate my finger to the left and to the right until the scanner beeped, indicating that it had registered a clear digital image of my fingerprint. This whole process took about 20 minutes. 

After registering all of my fingerprints, I went to the front entry door to test if I was able to enter using my finger. I typically use my thumb to enter since it provides the biggest fingerprint image and is easier for the machine to read.

Once everything was all set up, I was able enter the building freely, using only my finger.


The strength of the fingerprint scanning system is that it is totally keyless entry. I do not need to carry multiple keys in order to enter the building, the swimming pool, and the gym. Traditionally, I had to have separate keys for each of the doors that entered into common areas of the community. Now with the biometric fingerprint scanner all I needed to do was scan my fingerprint on any of the doors, and I could access anywhere in the complex. 

Keyless entry also comes in handy because I often lose my keys, or I forget my keys inside the house. Your fingers, fortunately, go wherever you go.

Another strength of using a biometric fingerprint scanner is that it is more environmentally friendly. For the creation of a key, metal needs to be extracted from the Earth somewhere. 

One final strength of the biometric fingerprint scanner is that it is easy when I have guests come to town. I do not need to create a spare key or give them a copy of my key. All I need to do is take them down to the administration and have their fingerprints registered.


One of the main weaknesses of this keyless entry is that it is not sanitary. Not everybody has the cleanest hands. When everybody in the entire complex is touching the fingerprint scanner, bacteria can really build up. Facial recognition would be a good alternative to solving this problem because I wouldn’t actually have to touch anything upon entry. 

I’m also not sure how secure the fingerprint scanner is. Imagine somebody who has been evicted out of their apartment for failure to pay rent. There has to be an ironclad management to make sure that as soon as somebody is evicted from their apartment, his or her fingerprints are automatically removed from the system. 

Another weakness is that the fingerprint scanner is not flawless. Often I have to try five or even sometimes six times using different angles of my thumb and forefinger to register a successful reading to enter the door. The fingerprint scanner is highly sensitive to the way in which you put your finger on the scanner. A slight twist to the left or right might not register a successful reading. 

Also, for some odd reason, when I return from a long vacation, the fingerprint scanner never reads my fingerprints accurately. This happens because the administration likes to reset the fingerprint scanner every so often. When this happens, I have to go to the administration to re-register my fingerprints.

While I like the biometric fingerprint scanner, other techniques are a lot more foolproof. For example, typing in a PIN code will work virtually 100% of the time I try to open the door. Whereas with a fingerprint scanner, putting my fingerprint on the scanner opens the door at most 80 to 90% of the time on the first try.

Hierarchical Actions and Reinforcement Learning

One of the issues of reinforcement learning is how it handles hierarchical actions.

What are Hierarchical Actions?

In order to explain hierarchical actions, let us take a look at a real-world example. Consider the task of baking a sweet potato pie. The high-level action of making a sweet potato pie can be broken down into numerous low-level sub steps: cut the sweet potatoes, cook the sweet potatoes, add sugar, add flour, etc.

You will also notice that each of the low-level sub steps mentioned above can further be broken down into even further steps. For example, the task of cutting a sweet potato can be broken down into the following steps: move right arm to the right, orient right arm above the pie, bring arm down, etc.

Each of those sub steps of sub steps can then be further broken down into even smaller steps. For example, “moving right arm to the right” might involve thousands of different muscle contractions. Can you see where we are going here?

Reinforcement learning involves training a software agent to learn by experience through trial and error. A basic reinforcement learning algorithm would need to do a search over thousands of low-level actions in order to execute the task of making a sweet potato pie. Thus, reinforcement learning methods would quickly get inefficient for tasks that require a large number of low-level actions.

How to Solve the Hierarchical Action Problem

One way to solve the hierarchical action problem is to represent a high-level behavior (e.g. making a sweet potato pie) as a small sequence of high-level actions. 

For example, where the solution of making a sweet potato pie might entail 1000 low-level actions, we might condense these actions into 10 high-level actions. We could then have a single master policy that switches between each of the 10 sub-policies (one for each action) every N timesteps. The algorithm explained here is known as meta-learning shared hierarchies and is explained in more detail at

We could also integrate supervised learning techniques such as ID3 decision trees. Each sub-policy would be represented as a decision tree where the appropriate action taken is the output of the tree. The input would be a transformed version of the state and reward that was received from the environment. In essence, you would have decisions taken within decisions.

Partial Observability and Reinforcement Learning

In this post, I’m going to discuss how supervised learning can address the partial observability issue in reinforcement learning.

What is Partial Observability?

In a lot of the textbook examples of reinforcement learning, we assume that the agent, for example a robot, can perfectly observe the environment around it in order to extract relevant information about the current state. When this is the case, we say that the environment around the agent is fully observable

However, in many cases, such as in the real world, the environment is not always fully observable. For example, there might be noisy sensors, missing information about the state, or outside interferences that prohibit an agent from being able to develop an accurate picture of the state of its surrounding environment. When this is the case, we say that the environment is partially observable.

Let us take a look at an example of partial observability using the classic cart-pole balancing task that is often found in discussions on reinforcement learning.

Below is a video demonstrating the cart-pole balancing task. The goal is to keep to keep a pole from falling over by making small adjustments to the cart support underneath the pole.

In the video above, the agent learns to keep the pole balanced for 30 minutes after 600 trials. The state of the world consists of two parts:

  1. The pole angle
  2. The angular velocity

However, what happens if one of those parts is missing? For example, the pole angle reading might disappear. 

Also, what happens if the readings are noisy, where the pole angle and angular velocity measurements deviate significantly from the true value? 

In these cases, a reinforcement learning policy that depends only on the current observation xt (where x is the pole angle or angular velocity value and time t) will suffer in performance. This in a nutshell is the partial observability problem that is inherent in reinforcement learning techniques.

Addressing Partial Observability Using Long Short-Term Memory (LSTM) Networks

One strategy for addressing the partial observability problem (where information about the actual state of the environment is missing or noisy) is to use long short-term memory neural networks. In contrast to artificial feedforward neural networks which have a one-way flow of information from the input layer, LSTMs have feedback connections. Past information persists from run to run of the network, giving the system a “memory.” This memory can then be used to make predictions about the current state of the environment.

The details of exactly how the memory explained above is created is described in this paper written by Bram Baker of the Unit of Experimental and Theoretical Psychology at Leyden University. Baker showed that LSTM neural networks can help improve reinforcement learning policies by creating a “belief state.” This “belief state” is based on probabilities of reward, state transitions, and observations, given prior states and actions. 

Thus, when the actual state (as measured by a robot’s sensor for example) is unavailable or super noisy, an agent can use belief state information generated by an LSTM to determine the appropriate action to take.

Combining Deep Neural Networks With Reinforcement Learning for Improved Performance

The performance of reinforcement learning can be improved by incorporating supervised learning techniques. Let us take a look at a concrete example.

You all might be familiar with the Roomba robot created by iRobot. The Roomba robot is perhaps the most popular robot vacuum sold in the United States. 


The Roomba is completely autonomous, moving around the room with ease, cleaning up dust, pet hair, and dirt along the way. In order to do its job, the Roomba contains a number of sensors that enable it to perceive the current state of the environment (i.e. your house). 

Let us suppose that the Roomba is governed by a reinforcement learning policy. This learning policy could be improved if we have accurate readings of the current state of the environment. And one way to improve these readings is to incorporate computer vision.

Since reinforcement learning depends heavily on accurate readings of the current state of the environment, we could use deep neural networks (a supervised learning technique) to pre-train the robot so that it can perform common computer vision tasks such as recognizing objects, localizing objects, and classifying objects before we even start running the reinforcement learning algorithm. These “readings” would improve the state portion of the reinforcement learning loop.

Deep neural networks have already displayed remarkable accuracy for computer vision problems. We can use these techniques to enable the robot to get a more accurate reading of the current state of the environment so that it can then take the most appropriate actions towards maximizing cumulative reward.

Boltzmann Distribution and Epsilon Greedy Search

How Does the Boltzmann Distribution Fit Into the Discussion of Epsilon Greedy Search?

In order to answer your question, let us take a closer look at the definition of epsilon greedy search. With our knowledge of how that works, we can then see how the Boltzmann distribution fits into the discussion of epsilon greedy search.

What is Epsilon Greedy Search?

When you are training an agent (e.g. race car, robot, etc.) with an algorithm like Q-learning, you can either have the agent take a random action with probability ϵ or have the agent be greedy and take the action that corresponds to its policy with probability 1-ϵ (i.e. the action for a given state that has the highest Q-value). The former is known as exploration while the latter is called exploitation. In reinforcement learning, we have this constant dichotomy of:

  • exploration vs. exploitation
  • learn vs. earn
  • not greedy vs. greedy
  • Exploration: Try a new bar in your city.
  • Exploitation: Go to the same watering hole you have been going to for decades.
  • Exploration: Start a business.
  • Exploitation: Get a job.
  • Exploration: Try to make new friends.
  • Exploitation: Keep inviting over your college buddies.
  • Exploration: Download Tinder dating app.
  • Exploitation: Call the ex.
  • Exploration (with probability ϵ): Gather more information about the environment.
  • Exploitation (with probability 1-ϵ): Make a decision based on the best information (i.e. policy) that is currently available.

The epsilon greedy algorithm in which ϵ is 0.20 says that most of the time the agent will select the trusted action a, the one prescribed by its policy π(s) -> a. However, 20% of the time, the agent will choose a random action instead of following its policy. 

We often want to have the epsilon-greedy algorithm in place for a reinforcement learning problem because often what is best for the agent long term (e.g. trying something totally random that pays off in a big way down the road) might not be the best for the agent in the short term (e.g. sticking with the best option we already know).

What Does the Boltzmann Distribution Have to Do With Epsilon Greedy Search?

Notice in the epsilon greedy search section above, I said that 20% of the time the agent will choose a random action instead of following its policy. The problem with this is that it treats all actions equally when making a decision on what action to take. What happens though if some actions might look more promising than others? Plain old epsilon greedy search cannot handle a situation like this. The fact is that, in the real world, all actions are not created equal.

A common method is to use the Boltzmann distribution (also known as Gibbs distribution). Rather than blindly accepting any random action when it comes time for the agent to explore the environment from a given state s, the agent selections an action a (from a set of actions A) with probability:


What this system is doing above is ranking and weighting all actions in the set of possible actions based on their Q-values. This system is often referred to as softmax selection rules.

Take a closer look at the equation above to see what we are doing here. A really high value of tau means that all actions are equally likely to be selected because we are diluting the impact of the Q-values for each action (by dividing by tau). However, as tau gets lower and lower, there will be greater differences in the selection probabilities for each action. The action with the highest Q[s,a] value is therefore much more likely to get selected. And when tau gets close to zero, the Boltzmann selection criteria I outlined above becomes indistinguishable from greedy search. For an extremely low value of tau, the agent will select the action with the highest Q-value and therefore never explore the environment via a random action.