How to Install TensorFlow 2 on Windows 10

In this post, I will show you how to install TensorFlow 2 on Windows 10. TensorFlow2 is a free software library used for machine learning applications. It comes integrated with Keras, a neural-network library written in Python. If you want to work with neural networks and deep learning, TensorFlow 2 should be your software of choice because of its popularity both in academia and in industry. Let’s get started!

Table of Contents

You Will Need 

Directions

Install TensorFlow 2

Here are the official instructions for downloading TensorFlow 2, but I will walk you through the process step-by-step.

Open an Anaconda command prompt terminal.

1-open-promptJPG

Type the command below to create a virtual environment named tf_2 with the latest version of Python installed. A virtual environment is like an independent Python workspace which has its own set of libraries and Python version installed. For example, you might have a project that needs to run using an older version of Python, like Python 2.7. You might have another project that requires Python 3.7. You can create separate virtual environments for these projects.

conda create -n tf_2 python

Press y and then ENTER.

2-type-yJPG

Wait for the software to download.

3-activate-tensorflow-2JPG

Once the download is finished, activate the virtual environment using this command:

conda activate tf_2

Check which version of Python you have installed on your system. I have Python 3.8.0.

python --version
4-python-versionJPG

Choose a TensorFlow package. I’ll install TensorFlow CPU. Let’s type the following command:

5-choose-a-packageJPG
pip install --upgrade tensorflow

You might see this error:

ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none)

ERROR: No matching distribution found for tensorflow

If you do, you need to downgrade your version of Python. TensorFlow is not yet compatible with your newest version of Python.

conda install python=3.6

Press y and then ENTER.

Check which version of Python you have installed on your system. I have Python 3.6.9 now.

python --version
6-downgrade-pythonJPG

Now install TensorFlow 2.

pip install --upgrade tensorflow

Wait for Tensorflow CPU to finish installing. Once it is finished installing, verify the installation by typing:

python -c "import tensorflow as tf; x = [[2.]]; print('tensorflow version', tf.__version__); print('hello, {}'.format(tf.matmul(x, x)))"

Here is the output:

9-voilaJPG

You should see your TensorFlow version in the output.

You might see this message:

“I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2”

Don’t worry, TensorFlow is working just fine. To get rid of that message, you can set the environment variables inside the virtual environment. Type the following command:

set TF_CPP_MIN_LOG_LEVEL=2

Now run this command:

python -c "import tensorflow as tf; x = [[2.]]; print('tensorflow version', tf.__version__); print('hello, {}'.format(tf.matmul(x, x)))"

Voila! Message gone. 

9-voilaJPG-1

Return to Table of Contents

Create a Basic Neural Network Using TensorFlow 2

To really see what TensorFlow 2 can do, let’s do the following:

  • Build a neural network that classifies images of clothing.
  • Train this neural network.
  • And, finally, evaluate the accuracy of the model.

We are going to roughly follow the TensorFlow beginner tutorial.

First, install the Matplotlib library.

pip install matplotlib

I’m now going to open up a text editor and type a Python program. I will save it to my D drive as fashion_mnist.py. Here is the code:

from __future__ import absolute_import, division, print_function, unicode_literals

# Import the key libraries
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

# Rename tf.keras.layers
layers = tf.keras.layers

# Print the TensorFlow version
print(tf.__version__)

# Load and prepare the MNIST dataset. 
# Convert the samples from integers to floating-point numbers:
mnist = tf.keras.datasets.fashion_mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Let's plot the data so we can see it
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal',
 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
plt.figure(figsize=(10,10))

for i in range(25):
 plt.subplot(5,5,i+1)
 plt.xticks([])
 plt.yticks([])
 plt.grid(False)
 plt.imshow(x_train[i], cmap=plt.cm.binary)
 plt.xlabel(class_names[y_train[i]])
plt.show()

Within your virtual environment in the Anaconda terminal, navigate to where you saved your code. I will type.

D:

Then:

cd D:\<YOUR_PATH>\install_tensorflow2

Type dir to see if the Python (.py) file is in that directory.

Now run the code:

python fashion_mnist.py

You should see this graphic pop up.

10-fashion-datasetJPG

In the terminal window, press CTRL+C on your keyboard to stop the code from running.

Let’s add to our code. Open up the Python file again in the text editor and type the following code. If you are new to neural networks, don’t worry what everything means at this stage.

from __future__ import absolute_import, division, print_function, unicode_literals

# Import the key libraries
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

# Rename tf.keras.layers
layers = tf.keras.layers

# Print the TensorFlow version
print(tf.__version__)

# Load and prepare the MNIST dataset. 
# Convert the samples from integers to floating-point numbers:
mnist = tf.keras.datasets.fashion_mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Let's plot the data so we can see it
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal',
 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
plt.figure(figsize=(10,10))

for i in range(25):
 plt.subplot(5,5,i+1)
 plt.xticks([])
 plt.yticks([])
 plt.grid(False)
 plt.imshow(x_train[i], cmap=plt.cm.binary)
 plt.xlabel(class_names[y_train[i]])
plt.show()

# Build the neural network layer-by-layer
model = tf.keras.Sequential()
model.add(layers.Flatten()) # Make the input layer one-dimensional
model.add(layers.Dense(64, activation='relu')) # Layer has 64 nodes; Uses ReLU
model.add(layers.Dense(64, activation='relu')) # Layer has 64 nodes; Uses ReLU
model.add(layers.Dense(10, activation='softmax')) # Layer has 64 nodes; Uses Softmax

# Choose an optimizer and loss function for training:
model.compile(optimizer='adam',
 loss='sparse_categorical_crossentropy',
 metrics=['accuracy'])
 
# Train and evaluate the model's accuracy
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test,  y_test, verbose=2)

Run the code:

python fashion_mnist.py

When you see the plot of the clothes appear, just close that window so that the neural network build and run.

Here is the output.

11-accuracyJPG

The accuracy of classifying the clothing items was 87.5%. Pretty cool huh! Congratulations! You’ve built and run your first neural network on TensorFlow 2.

To deactivate the virtual environment, type:

conda deactivate

Then to exit the terminal, type:

exit

At this stage, I encourage you to go through the TensorFlow tutorials to get more practice using this really powerful tool.

Return to Table of Contents

Difference Between Supervised and Unsupervised Learning

In this post, I will explain the difference between supervised and unsupervised learning.

Table of Contents

What is Supervised Learning?

new_home_for_sale_4

Imagine you have a computer. The computer is really good at doing math and making complex calculations. You want to “train” your computer to predict the price for any home in the United States. So you search around the Internet and find a dataset. The dataset contains the following information  for 100,000 houses that sold during the last 30 days in various cities across the United States:

  1. Square footage
  2. Number of bathrooms
  3. Number of bedrooms
  4. Number of garages
  5. Year the house was constructed
  6. Average size of the house’s windows
  7. Sale price

Variables 1 through 6 above are the dataset’s features (also known as input variables, attributes, independent variables, etc.). Variable 7, the house sale price, is the output variable (or target variable) that we want our computer to get good at predicting. 

So the question is…how do we train our computer to be a good house price predictor?  We need to write a software program. The program needs to take as input the 100,000-house dataset that I mentioned earlier. This program needs to then find a mathematical relationship between variables 1-6 (features) and variable 7 (output variable). Once the program has found a relationship between the features (input) and the output, it can predict the sale price of a house it has never seen before.

1-ipo

Let’s take a look at an analogy. Supervised learning is like baking a cake. 

Suppose you had a cake machine that was able to cook many different types of cake from the same set of ingredients. All you have to do as the cake chef is to just throw the ingredients into the machine, and the cake machine will automatically make the cake.

The “features,” the inputs to the cake machine, are the following ingredients:

  1. Butter
  2. Sugar
  3. Vanilla Extract
  4. Flour
  5. Chocolate
  6. Eggs
  7. Salt

The output is the type of cake:

  1. Vanilla Cake
  2. Pound Cake
  3. Chocolate Cake
  4. Dark Chocolate Cake
cake_cakes_sweet_bake

Different amounts of each ingredient will produce different types of cake. How does the cake machine know what type of cake to produce given a set of ingredients? 

Fortunately, a machine learning engineer has written a software program (containing a supervised learning algorithm) that is running inside the cake machine. The program was pre-trained on a dataset containing 1 million cakes. Each entry (i.e. example or training instance) in this gigantic dataset contained two key pieces of data: 

  1. How much of each ingredient was used in the making of that given cake
  2. The type of cake that was produced

During the training phase of the program, the program found a fairly accurate mathematical relationship between the amount of each ingredient and the cake type. And now, when the cake machine is provided with a new set of ingredients by the chef, it automatically “knows” what type of cake to produce. Pretty cool huh!

2-ipo-2

What I described above is called supervised learning. It is called supervised learning because the input data (which the supervised learning algorithm used to train) is already labeled with the “correct” answers (e.g. the type of cake in our example above; or the sale price values for those 100,000 homes from the earlier example I presented in this post.). We supervised the learning algorithm by “telling” it what the output (cake type) should be for 1 million different sets of input values (ingredients). 

The fundamental idea of a supervised learning algorithm is to learn a mathematical relationship between inputs and outputs so that it can predict the output value given an entirely new set of input values. 

Let’s take a look at a common supervised learning algorithm: linear regression. The goal of linear regression is to find a line that best fits the relationship between input and output. For example, the learning algorithm for linear regression could be trained on square footage and sale price data for 100,000 homes. It would learn the mathematical relationship (e.g. a straight line in the form y = mx + b) between square footage and the sale price of a home. 

3-square-footage

With this relationship (i.e. line of best fit) in hand, the algorithm can now easily predict the sale price of any home just by being provided with the home’s square footage value. 

For example, let’s say we wanted to find the price of a house that is 2000 ft2. We feed 2,000 ft2 into the algorithm. The algorithm predicts a sale price of $500,000.

4-price-square-footage

As you can imagine, before we make any predictions using a supervised learning algorithm, it is imperative to train it on a lot of data. Lots and lots of data. The more the merrier.

In the example above, to get that best fit line, we want to feed it with as many examples of houses as possible during training. The more data it has, the better its predictions will be when given new data. This, my friends, is supervised learning, and it is incredibly powerful. In fact, supervised learning is the bread and butter of most of the state-of-the-art machine learning techniques today, such as deep learning.

Now, let’s look at unsupervised learning.

Return to Table of Contents

What is Unsupervised Learning?

Let’s suppose you have the following dataset for a group of 13 people. For each person, we have the following features:

  • Height (in inches)
  • Hair Length (in inches)

Let’s plot the data to see what it looks like:

5-height

In contrast to supervised learning, in this case there is no output value that we are trying to predict (e.g. house sale price, cake type, etc.). All we have are features (inputs) with no corresponding output variables. In machine learning jargon, we say that the data points are unlabeled

So instead of trying to force the dataset to fit a straight line or some sort of predetermined mathematical model, we let an unsupervised learning algorithm find a pattern all on its own. And here is what we get:

6-hair-length-vs-height

Aha! It looks like the algorithm found some sort of pattern in the data. The data is clustered into two distinct groups. What these clusters mean, we do not know because the data points are unlabeled. However, what we do suspect given our prior knowledge of this dataset, is that the blue dots are males, and the red dots are females given the attributes are height and hair length. 

What I have described above is known as unsupervised learning. It is called unsupervised because the input dataset is unlabeled. There is no output variable we are trying to predict. There is no prior mathematical model we are trying to fit the data to. All we want to do is let the algorithm find some sort of structure or pattern in the data. We let the data speak for itself. 

Any time you are given a dataset and want to group similar data points into clusters, you’re going to want to use an unsupervised learning algorithm.

Return to Table of Contents

Hierarchical Actions and Reinforcement Learning

One of the issues of reinforcement learning is how it handles hierarchical actions.

What are Hierarchical Actions?

In order to explain hierarchical actions, let us take a look at a real-world example. Consider the task of baking a sweet potato pie. The high-level action of making a sweet potato pie can be broken down into numerous low-level sub steps: cut the sweet potatoes, cook the sweet potatoes, add sugar, add flour, etc.

You will also notice that each of the low-level sub steps mentioned above can further be broken down into even further steps. For example, the task of cutting a sweet potato can be broken down into the following steps: move right arm to the right, orient right arm above the pie, bring arm down, etc.

Each of those sub steps of sub steps can then be further broken down into even smaller steps. For example, “moving right arm to the right” might involve thousands of different muscle contractions. Can you see where we are going here?

Reinforcement learning involves training a software agent to learn by experience through trial and error. A basic reinforcement learning algorithm would need to do a search over thousands of low-level actions in order to execute the task of making a sweet potato pie. Thus, reinforcement learning methods would quickly get inefficient for tasks that require a large number of low-level actions.

How to Solve the Hierarchical Action Problem

One way to solve the hierarchical action problem is to represent a high-level behavior (e.g. making a sweet potato pie) as a small sequence of high-level actions. 

For example, where the solution of making a sweet potato pie might entail 1000 low-level actions, we might condense these actions into 10 high-level actions. We could then have a single master policy that switches between each of the 10 sub-policies (one for each action) every N timesteps. The algorithm explained here is known as meta-learning shared hierarchies and is explained in more detail at OpenAi.com.

We could also integrate supervised learning techniques such as ID3 decision trees. Each sub-policy would be represented as a decision tree where the appropriate action taken is the output of the tree. The input would be a transformed version of the state and reward that was received from the environment. In essence, you would have decisions taken within decisions.