The Complete Guide to Linux Fundamentals for Robotics

In this tutorial, you will learn the most common commands and tools you will use again and again as you work with ROS on Linux for your robotics projects. 

While there are hundreds of Linux commands, there are really only a handful that you will use repeatedly. I’m a firm believer in the Pareto Principle (also known as the 80/20 rule). Don’t waste your time memorizing a bunch of commands you may never use; instead focus on learning the fundamentals, the handful of commands and tools that you will use most frequently. Everything else, you can look it up when you need it.

This tutorial has a lot of steps, but be patient and take your time as you work all the way through it. By the end of this tutorial, you will have the rock-solid confidence to move around in Linux with ease.

Without further ado, let’s get started!

Table of Contents

Prerequisites

Explore the Folder Hierarchy

Open up a fresh Linux terminal window.

Let’s check out the directory (i.e. folder) structure of the catkin_ws folder, the workspace folder for ROS. 

Install the tree program.

sudo apt-get install tree

Type:

tree catkin_ws

You should see a hierarchy of all the folders underneath the catkin_ws folder.

1-tree-catkin-wsJPG

If at anytime, you want to see the hierarchy of all folders underneath a specific folder, just type:

tree <path to the folder>

Return to Table of Contents

Navigate Between Folders

Ok, now we need to move to our catkin_ws/src folder.

cd catkin_ws/src 

If you don’t have Git installed, install it now: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git

You might have nothing in your catkin_ws/src directory at this stage. In order to learn Linux, it is best if we work with an actual ROS project with real folders and files so that we get comfortable with how Linux works. 

Fortunately, the Robot Ignite Academy (they provide excellent ROS courses by the way) has some projects and simulations that they made publicly available here: https://bitbucket.org/theconstructcore/. Let’s download their Linux files.

Type this command (all on one line):

git clone https://bitbucket.org/theconstructcore/linux_course_files.git
2-install-git-filesJPG

Type the following command to see what files you have in there:

dir

You should have a folder named linux_course_files

3-linux-course-filesJPG

Linux systems are made up of two main parts:

  1. Files: Used to store data
  2. Folders: Used to store files and other folders

Ok, let’s move to the folder that contains the Python file bb8_keyboard.py. We have to change directories (i.e. move folders) to get to where the program is located. Let’s do that now.

cd linux_course_files/move_bb8_pkg/src

Type the following command to see what is inside that folder:

dir
4-see-what-is-insideJPG

Let’s check the path to this directory.

pwd
5-check-pathJPG

Now, we move up one folder, to the move_bb8_pkg folder.

cd ..
6-move-up-a-directoryJPG

How do we move back to the src folder?

cd src
7-move-backJPG

How do we get back to the home folder?

cd ~

Note that, we could have also done:

cd /home/ros

~ is equal to /home/ros.

Now that we are at the home folder, let’s see what the official path to this directory is:

pwd
8-path-to-home-folderJPG

List all the folders in the home directory.

ls
9-list-folders-in-home-directoryJPG

That ls command doesn’t list the hidden files. We need to type this command to list all the files.

ls -la

The . in front of the files means that they are hidden. You won’t see them using the regular ls command.

10-list-hidden-filesJPG

We could have also done:

ls --all

To get all the possibilities of commands you can type with ls, just type:

ls --help

To get the manual, we could have alternatively typed:

man ls

To get out of the manual, type q.

Return to Table of Contents

Create New Folders and Files

Ok, let’s learn how to make a new folder. Type the following command:

mkdir test_folder

If you type dir, you should see a new folder in there named test_folder.

Move to that folder.

cd test_folder

Create a new text file named new_file.txt.

touch new_file.txt
11-new-fileJPG

Open the new file in a text editor.

gedit new_file.txt

Close the text editor.

Now, go to the linux_course_files folder.

cd  ~/catkin_ws/src/linux_course_files/

Create a new folder called scripts.

mkdir scripts

Move that file to the scripts folder

cd scripts

Create a new Python program called hello_world.py

touch hello_world.py

Open the program using the text editor.

gedit hello_world.py

Write a simple program that prints “Hello World” every two seconds.

13-hello-worldJPG

Save the program, and run it by typing:

python hello_world.py

Now go to the catkin_ws/src/linux_course_files/ folder.

cd  ~/catkin_ws/src/linux_course_files/

Move the scripts folder to the move_bb8_pkg folder.

14-scripts-folder-thereJPG

The syntax is:

mv <file/folder we want to move> <destination>

So we type:

mv scripts move_bb8_pkg 

Note that in this case, we were in the linux_course_files folder, which contains both the scripts and move_bb8_pkg folders. For this reason, we did not have to type out the entire path of both folders (e.g. the entire path of the scripts folder is ~/catkin_ws/src/linux_course_files/scripts).

Now let’s move to the move_bb8_pkg folder to see if the scripts folder is in there.

cd move_bb8_pkg

Note, in the command above, if we would have typed cd m (i.e. the first letter of the folder of interest) and then pressed the TAB button on our keyboard, the system would have automatically filled in the rest. This saves us time from having to type out cd move_bb8_pkg.

The scripts folder is in there, containing the hello_world.py program.

cd scripts
dir
15-with-hello-worldJPG

Let’s copy the hello_world.py file into a new Python file:

cp  hello_world.py hello_world_copy.py
16-copyJPG

You have a new file called hello_world_copy.py, which has all the contents of hello_world.py.

The syntax for the copy operation in Linux is:

cp <file we want to copy> <name of the new file>

If you want to copy a folder, you can do the following command:

cp -r <folder we want to copy> <name of the new folder>

Remember, any time you want to see what other options you have with the cp command, just type:

cp --help

If you ever want to remove a file, here is the command:

rm <file to remove>

To remove a folder, you type:

rm -r <folder you want to remove>

Return to Table of Contents

Explore Permissions in Linux

Let’s take a look at how permissions work in Linux. Let’s open a new Linux terminal and change directories to our scripts folder.

cd  ~/catkin_ws/src/linux_course_files/move_bb8_pkg/scripts

Now type:

ls -la

Here is the output:

17-permissions-ls-laJPG

Notice the hello_world.py file. In the beginning of the line, you see the following:

-rw-r--r--

Linux has three permission types for files and directories:

  • Read: Denoted by the letter r. This means that the user can read the file or directory.
  • Write: Denoted by the letter w. This means that the user can write or modify the file or directory.
  • Execute: Denoted by the letter x. This means that the user can execute the file.

There are also three different user permission groups:

  • Owner: That’s me!
  • Group: Whatever group the file or directory was assigned to.
  • All Users: This permission group includes the rest of the users.

Given this information above, let’s translate the line of hello_world.py containing -rw-r–r–. Everything is in order, reading from left to right:

  • Owner has read and write privileges (i.e. rw-)
  • The group has read privileges (i.e. r–)
  • All other users have only read privileges (r–).

Notice how no users, including the owner, have execute privileges on the hello_world.py file.

Try executing the hello_world.py file now, and see what you get:

rosrun <package name> <program you want to run>
rosrun move_bb8_pkg hello_world.py

You get this ugly message about not being able to find the executable. How do we change that?

file_didnt_run

You must use the chmod command.

chmod +x hello_world.py

Now type:

ls -la
18-chmod-lslaJPG

Can you see that little x, which means that execution permissions have been added to the file? The following permissions are now in place:

  • Owner: Read, Write and Execute (rwx)
  • Group: Read and Execute (r-x)
  • All Other Users: Read and Execute(r-x)

Now, run hello_world.py again using the ROS command rosrun.

rosrun move_bb8_pkg hello_world.py

Voila! You should see your program working now, with “Hello World” messages scrolling down the screen.

20-voilaJPG

Return to Table of Contents

Create a Bash Script

Up until now, when we wanted to run commands in Linux, we opened up a terminal window and typed the command manually and then ran it. Examples of the commands we have used so far include cd (to change directories), ls (to list the folders and files inside a directory), and mv (to move a folder or file from one place to another), etc. 

Fortunately, Linux has something called bash scripts. Bash scripts are text files that can contain commands that we would normally type out manually in the Linux terminal. Then, when you want to run a set of commands, all you have to do is run the bash script (which contains the commands that you want to execute). Pretty cool right? Software engineers are always looking for ways to automate tedious tasks!

Let’s take a look now at bash scripts. Open a new terminal window. Move to your scripts folder.

cd ~/catkin_ws/src/linux_course_files/move_bb8_pkg/scripts

Let’s create a new file named bash_script.sh.

touch bash_script.sh

Type the following command, and you should see the bash_script.sh file inside:

dir

Now open bash_script.sh.

gedit bash_script.sh

Type these two lines of code inside that file and click Save.

21-type-this-bash-scriptJPG

The first line of code is:

#!/bin/bash

This line of code tells Linux that this file is a bash script.

The .sh file extension is what you use for bash scripts. The echo command just tells the Linux terminal to print the message that follows to the screen.

Now, let’s close out the text editor and go back to the terminal window.

Type the following command:

./bash_script.sh

Uh oh! We have an error. Something about permissions. Let’s find out why.

Type the following command:

ls -la
22-read-write-permissionsJPG

Notice how the bash_script.sh file only has read and write permissions because there is no x. Let’s fix that by giving ourselves execute permissions on this file.

chmod +x bash_script.sh

Now type:

ls -la
23-change-permissionsJPG

You should see that we now have execute permissions.

Let’s try to run the program again.

./bash_script.sh
24-run-program-againJPG

We can also pass arguments to bash scripts.

Create a new bash script file named demo.sh.

touch demo.sh

Open the file:

gedit demo.sh

Add this code:

25-new-bash-scriptJPG

The $1 means the first argument. We can have $1, $2, $3…$N…depending on how many arguments you want to pass to the script. In our example above, we are only passing one argument. This argument is stored in the ARG1 variable.

The fi at the end of the if statement is necessary to close out the if statement.

Save the file.

Change the file’s permissions.

chmod +x demo.sh

Now run the program, passing in the argument one ‘AutomaticAddison’ one space to the right of the command.

./demo.sh AutomaticAddison
26-new-bash-script-runJPG

Return to Table of Contents

Explore the .bashrc File

Type the following command to go back home.

cd

Type the following command to list all files.

ls -la

You see that .bashrc file? Open it up in a text editor.

gedit .bashrc
27-bashrcJPG

Hmmm. What is this file with all this crazy code in it?

The .bashrc is a special bash script which is always located in your home directory. It contains an assortment of commands, aliases, variables, configuration, settings, and useful functions. 

The .bashrc script runs automatically any time you open up a new terminal, window or pane in Linux. However, if you have a terminal window open and want to rerun the .bashrc script, you have to use the following command:

source .bashrc

Return to Table of Contents

Explore Environment Variables

Open a fresh, new terminal window and type:

export

You will see a list of all the environment variables in your system with their corresponding values. Environment variables are variables that describe the environment in which programs run. The programs that run on your computer use environment variables to answer questions such as: What is the username of this computer? What version of ROS is installed? Where is ROS installed?, etc.

There are lots of environment variables. How do we filter this list to get only the variables that contain ROS in them? Type the following command:

export | grep ROS
28-ros-grepJPG

The ROS_PACKAGE_PATH variable tells the system where a program would find ROS packages.

If at any point, you want to change a variable’s value, you use the following syntax:

export ROS_PACKAGE_PATH= “<some new path here>"

The grep command is pretty cool. You can use it in conjunction with other commands too. Go to your catkin_ws/src folder.

cd ~/catkin_ws/src

Type:

ls
29-listJPG

You should see a list of all files. Now type this:

ls | grep hello
30-grep-with-lsJPG

You will see that all files containing the string ‘hello’ in the name are shown on the screen.

Return to Table of Contents

Understand Processes in Linux

In this section, we’ll take a look at Linux processes. A process is a computer program in action. A computer program consists of data and a set of instructions for your computer to execute. 

At any given time multiple processes are running on your Linux system. There are foreground processes and background processes. 

  • Foreground processes (also known as interactive processes) require user input and will only initialize if requested by a user in a terminal window. 
  • Background processes (also known as non-interactive or automatic processes) launch automatically, without requiring prior user input.

Return to Table of Contents

Launch a Foreground Process

Let’s see the processes that are currently running on our system. Open a new terminal window and type:

ps faux
31-ps-fauxJPG

Now, let’s start a new foreground process. In a new terminal tab, type the following command:

rosrun move_bb8_pkg hello_world.py
32-launch-hello-worldJPG

Open a new tab, and type:

ps faux | grep hello_world

The first result is the hello_world.py program we are running. 

33-hello-world-processJPG

Go back to the hello_world.py terminal and kill the process by pressing:

Ctrl + C

Now return to the other terminal window and type this command:

ps faux | grep hello_world
34-process-is-goneJPG

The process is now gone because we killed it (That grep–color=auto hello_world result has nothing to do with the hello_world.py program, so you can ignore that).

What happens if we have a process and all we want to do is to suspend it rather than kill it? What do we do?

Let’s take a look at that now.

In a new terminal window, launch the hello_world.py program again.

rosrun move_bb8_pkg hello_world.py

In a new terminal tab, type:

ps faux | grep hello_world
35-process-idJPG

The number that is in the second column is the process ID (PID). The PID is 17208 (yours will be different). 

Now return to the window where “Hello World” keeps printing out and suspend (i.e. push the foreground process to the background) by typing:

Ctrl + Z

36-suspend-processJPG

In a new terminal tab, type:

ps faux | grep hello_world

You will notice that the hello_world.py process is still an active process, but it is now running in the background.

Now let’s kill the process. The syntax is kill <PID>. In my case, I type:

kill 17208

If you type ps faux | grep hello_world , you will notice the process is still not killed. The reason it is still not killed is because it is in the background. We need to go back to the window where we were running the hello_world.py script and resume the suspended process by typing:

bg
37-bg-commandJPG

In a new terminal tab, type:

ps faux | grep hello_world

You will see that the process has been killed.

Return to Table of Contents

Launch a Background Process

Now, let’s launch our hello_world.py program as a background process rather than a foreground process. Type the following in a new terminal window.

rosrun move_bb8_pkg hello_world.py &
38-launch-background-processJPG

The PID (process ID) is 17428.

Now try to kill the process.

Ctrl + C

Didn’t work did it? Now try:

Ctrl + Z

Still didn’t work.

The reason the Ctrl + C or Ctrl + Z commands did not work is because they only work on foreground processes. We launched the process in the background. To kill it, we need to open up a new terminal tab and type kill <PID>:

kill 17428

To verify that the process is killed, type:

ps faux | grep hello_world

Return to Table of Contents

Understand Secure Shell (SSH Protocol) in Linux

In this section, we will explore the SSH protocol. SSH protocol provides a secure way to access a computer over an unsecured network. Your computer (the “client”) can access another computer (the “server”) to transfer files to it, run programs, carry out commands, etc.

If you have been following me for a while, you might remember when I connected my Windows-based laptop computer (“client”) to a Raspberry Pi (“server”) using the SSH protocol. I was then able to control the Raspberry Pi directly from my laptop.

The most common use case for SSH protocol is when you want to control a robot remotely. For example, imagine a robot that has an onboard computer (e.g. a Raspberry Pi). You can use SSH to run programs on the Raspberry Pi via your personal laptop computer.

Let’s see how SSH works. We will connect to a Raspberry Pi that is already setup with SSH from our Ubuntu Linux virtual box machine. The Raspberry Pi is the (“remote” server), and Ubuntu Linux, which is running in a Virtual Box on my Windows 10 machine, is the “local” client.

Note: You don’t have to go out and buy a Raspberry Pi right now. I just want to give you an idea of what the steps would be when you encounter a situation when you need to set up SSH communication between your local machine (e.g. personal laptop running Ubuntu Linux in a Virtual Box) and a remote server (e.g. Raspberry Pi) mounted on a robot.

The first thing we need to do is to install the openssh-server package on the Ubuntu Linux machine. 

Type:

sudo apt update
sudo apt install openssh-server

Enter your password and press Y.

Type the following:

sudo systemctl status ssh

You should see a message that says active (running).

39-ssh-runningJPG

Press q to exit.

Open the SSH port:

sudo ufw allow ssh

Check if ssh is installed.

ssh
41-check-ssh-installedJPG

On the Raspberry Pi, in a terminal window I type:

hostname -I
42-rpi-ip-addressJPG

I see my Raspberry Pi’s IP address is:

192.168.0.17

You can also type the command:

ip a

You will use the IP address next to inet

45-raspberry-pi-ip-addressJPG

I want to connect to it from my Ubuntu Linux virtual box machine. The syntax for connecting to a remote computer via SSH is:

ssh <user>@<host>

where user is the account I want to login to on the remote machine, and host is the IP address of the remote machine (server) I want to login to. In my case, from an Ubuntu Linux terminal window, I type:

ssh pi@192.168.0.17
43-connect-via-sshJPG

You will need to type the password for that username on the Raspberry Pi.

You can see that I am now connected to my Raspberry Pi. Type the following command to get a list of all the folders in the home directory.

ls
43b-list-files-on-rpiJPG

When I want to disconnect from the Raspberry Pi, I type:

exit
44-exit-remote-sshJPG

Typing the exit command gets me back to my normal user (i.e. ros).

Return to Table of Contents

Explore the “sudo” and “apt” Commands

Linux provides a tool called the Advanced Package Tool (APT). The Advanced Package Tool automates the retrieval, configuration and installation of software packages.

Let’s update the list of available software packages using this command in a new terminal window.

apt-get update 
46-we-have-a-problemJPG

Hmm. That didn’t work. We have a problem. The problem is that we don’t have permissions to access the package database. Now, run this command:

sudo apt-get update

Type your password. Here is the output:

47-it-workedJPG

It worked!

What does “sudo” mean?

sudo is an abbreviated form of “super user do.” Putting “sudo” in front of a command tells Linux to run the command as a super user (i.e. root user) or another user.

Return to Table of Contents

Keep Building!

Congratulations on reaching the end of this tutorial. I recommend you bookmark this page and come back to it regularly as you work with ROS on Linux. We have covered the most essential commands and tools you will use again and again as you work with ROS on Linux  for your robotics projects. Keep building!

How Do Neural Networks Make Predictions?

Neural networks are the workhorses of the rapidly growing field known as deep learning. Neural networks are used for all sorts of applications where a prediction of some sort is desired. Here are some examples:

  • Predicting the type of objects in an image or video
  • Sales forecasting
  • Speech recognition
  • Medical diagnosis
  • Risk management
  • and countless other applications… 

In this post, I will explain how neural networks make those predictions by boiling these structures down to their fundamental parts and then building up from there.

You Will Need 

Create Your First Neural Network

Imagine you run a business that provides short online courses for working professionals. Some of your courses are free, but your best courses require the students to pay for a subscription. 

You want to create a neural network to predict if a free student is likely to upgrade to a paid subscription. Let’s create the most basic neural network you can make.

1-first-neural-networkJPG

OK, so there is our neural network. To implement this neural network on a computer, we need to translate this diagram into a software program. Let’s do that now using Python, the most popular language for machine learning.

# Declare a variable named weight and 
# initiate it with a value
weight = 0.075

# Create a method called neural_network that 
# takes as inputs, the input data (number of 
free courses a student has taken during the 
# last 30 days) and the weight of the connection. 
# The method returns the prediction.

def neural_network(input, weight):

  # The input data multiplied by the weight 
  # equals the prediction
  prediction = input * weight

  # This is the output
  return prediction

So we currently have five students, all of whom are free students. The number of free courses these users have taken during the last 30 days is 12, 3, 5, 6, and 8. Let’s code this in Python as a list.

number_of_free_courses_taken = [12, 3, 5, 6, 8]

Let’s make a prediction for the first student, the one who has taken 12 free courses over the last 30 days.

2-first-neural-networkJPG

Now let’s put the diagram into code.

# Extract the first value of the list...12...
# and store into a variable named input
first_student_input = number_of_free_courses_taken[0]

# Call the neural_network method and store the 
# prediction result into a variable
first_student_prediction = neural_network(
                         first_student_input, weight)

# Print the prediction
print(first_student_prediction)

OK. We have finished the code. Let’s see how it looks all together.

weight = 0.075

def neural_network(input, weight):

  prediction = input * weight

  return prediction

number_of_free_courses_taken = [
                        12, 3, 5, 6, 8]

first_student_input = number_of_free_courses_taken[0]

first_student_prediction = neural_network(
                       first_student_input, weight)

print(first_student_prediction)

Open a Jupyter Notebook and run the code above, or run the code inside your favorite Python IDE.

Here is what I got:

run-code

What did you get? Did you get 0.9? If so, congratulations!

Let’s see what is happening when we run our code. We called the neural_network method. The first operation performed inside that method is to multiply the input by the weight and return the result. In this case, the input is 12, and the weight is 0.075. The result is 0.9.

3-making-predictionsJPG

0.9 is stored in the first_student_prediction variable.

4-making-predictionJPG

And this, my friend, is the most basic building block of a neural network. A neural network in its simplest form consists of one or more weights which you can multiply by input data to make a prediction

Let’s take a look at some questions you might have at this stage.

What kind of input data can go into a neural network?

Real numbers that can be measured or calculated somewhere in the real world. Yesterday’s high temperature, a medical patient’s blood pressure reading, previous year’s rainfall, or average annual rainfall are all valid inputs into a neural network. Negative numbers are totally acceptable as well.

A good rule of thumb is, if you can quantify it, you can use it as an input into a neural network. It is best to use input data into a neural network that you think will be relevant for making the prediction you desire.

For example, if you are trying to create a neural network to predict if a patient has breast cancer or not, how many fingers a person has probably not going to be all that relevant. However, how many days per month a patient exercises is likely to be a relevant piece of input data that you would want to feed into your neural network.

What does a neural network predict?

A neural network outputs some real number. In some neural network implementations, we can do some fancy mathematics to limit the output to some real number between 0 and 1. Why would we want to do that? Well in some applications we might want to output probabilities. Let me explain.

Suppose you want to predict the probability that tomorrow will be sunny. The input into a neural network to make such a prediction could be today’s high temperature. 

5-high-temperatureJPG

If the output is some number like 0.30, we can interpret this as a 30% change of the weather being sunny tomorrow given today’s high temperature. Pretty cool huh!

We don’t have to limit the output to between 0 and 1. For example, let’s say we have a neural network designed to predict the price of a house given the house’s area in square feet. Such a network might tell us, “given the house’s area in square feet, the predicted price of the house is $432,000.”

What happens if the neural network’s predictions are incorrect?

The neural network will adjust its weights so that the next time it makes a more accurate prediction. Recall that the weights are multiplied by the input values to make a prediction.

What is a neural network really learning?

A neural network is learning the best possible set of weights. “Best” in the context of neural networks means the weights that minimize the prediction error.

Remember, the core math operation in a neural network is multiplication, where the simplest neural network is:

Input Value * Weight = Prediction

How does the neural network find the best set of weights?

Short answer: Trial and error

Long answer: A neural network starts out with random numbers for weights. It then takes in a single input data point, makes a prediction, and then sees if its prediction was either too high or too low. The neural network then adjusts its weight(s) accordingly so that the next time it sees the same input data point, it makes a more accurate prediction.

Once the weights are adjusted, the neural network is fed the next data point, and so on. A neural network gets better and better each time it makes a prediction. It “learns” from its mistakes one data point at a time.

Do you notice something here?

Standard neural networks have no memory. They are fed an input data point, make a prediction, see how close the prediction was to reality, adjust the weights accordingly, and then move on to the next data point. At each step of the learning process of a neural network, it has no memory of the most recent prediction it made.

Standard neural networks focus on one input data point at a time. For example, in our subscriber prediction neural network we built earlier in this tutorial, if we feed our neural network number_of_free_courses_taken[1], it will have no clue what it predicted when number_of_free_courses_taken[0] was the input value.

There are some networks that have short term memories. These are called Long short-term memory networks (LSTM).

How the Canny Edge Detector Works

In this post, I will explain how the Canny Edge Detector works. The Canny Edge Detector is a popular edge detection algorithm developed by John F. Canny in 1986. The goal of the Canny Edge Detector is to:

  • Minimize Error: Edges that are detected by the algorithm as edges should be real edges and not noise.
  • Good Localization: Minimize the distance between detected edge pixels and real edge pixels.
  • Minimal Responses to Single Edges: In other words, areas of the image that are not marked as edges should not be edges.
me-over-lake-tahoe-kingsbury

How the Canny Edge Detector Works

The Canny Edge Detector Process is as follows:

  1. Gaussian Filter: Smooth the input image with a Gaussian filter to remove noise (using a discrete Gaussian kernel).
  2. Calculate Intensity Gradients: Identify the areas in the image with the strongest intensity gradients (using a Sobel, Prewitt, or Roberts kernel).
  3. Non-maximum Suppression: Apply non-maximum suppression to thin out the edges. We want to remove unwanted pixels that might not be part of an edge.
  4. Thresholding with Hysteresis:  Hysteresis or double thresholding involves:
    • Accepting pixels as edges if the intensity gradient value exceeds an upper threshold.
    • Rejecting pixels as edges if the intensity gradient value is below a lower threshold.
    • If a pixel is between the two thresholds, accept it only if it is adjacent to a pixel that is above the upper threshold.

Mathematical Formulation of the Canny Edge Detector

More formally, in step 1 of the Canny Edge Detector, we smooth an image by convolving the image with a Gaussian kernel. An example calculation showing the convolving mathematical operation is shown in the Sobel Operator discussion. Below is an example 5×5 Gaussian kernel that can be used.

1-gaussian-kernel

We must go through each 5×5 region in the image and apply the convolving operation between a 5×5 portion of the input image (with the pixel of interest as the center cell, or anchor) and the 5×5 kernel above. The result is then summed to give us the new intensity value for that pixel.

After smoothing the image using the Gaussian kernel, we then calculate the intensity gradients. A common method is to use the Sobel Operator.

Here are the two kernels used in the Sobel algorithm:

2-direction-kernel

The gradient approximations at pixel (x,y) given a 3×3 portion of the source image Ii are calculated as follows:

Gx = x-direction kernel * (3x3 portion of image A with (x,y) as the center cell)
Gy = y-direction kernel * (3x3 portion of image A with (x,y) as the center cell)

* above is not normal matrix multiplication. * denotes the convolution operation.

We then combine the values above to calculate the magnitude of the gradient:

magnitude(G) = square_root(Gx2 + Gy2)

The direction of the gradient Ɵ is:

Ɵ = atan(Gy / Gx)

where atan is the arctangent operator.

Once we have the gradient magnitude and direction, we perform non-maximum suppression by scanning the entire image to get rid of pixels that might not be part of an edge. Non-maximum suppression works by finding pixels that are local maxima in the direction of the gradient (gradient direction is perpendicular to edges).

If, for example, we have three pixels that are next to each other: pixels a, b, and then c. Pixel b is larger in intensity than both a and c where pixels a and c are in the gradient direction of b. Therefore, pixel b is marked as an edge. Otherwise, if pixel b was not a local maximum, it would be set to 0 (i.e. black), meaning it would not be an edge pixel.

a ——> b <edge> ——> c

Non-maximum suppression is not perfect because some edges might actually be noise and not real edges. To solve this, Canny Edge Detector goes one step further and applies thresholding to remove the weakest edges and keep the strongest ones. Edge pixels that are borderline weak or strong are only considered strong if they are connected to strong edge pixels.

Canny Edge Detector Code

This tutorial has the Python code for the Canny Edge Detector.

Conclusion

In this discussion, we covered the Canny Edge Detector. The Canny Edge Detector is just one of many edge detection algorithms.

The most common edge detection algorithms fall into the following categories:

  • Gradient Operators
    • Roberts Cross Operator
    • Sobel Operator
    • Prewitt Operator
  • Canny Edge Detector
  • Laplacian of Gaussian
  • Haralick Operator

Which edge detection algorithm you choose depends on what you are trying to achieve with your application.

Keep building!