For the k-nearest neighbors algorithm (k-NN), we store a set of instances and then classify each new instance based on the most common class among that instance’s k neighboring instances, where k is some positive integer.

Here are some reasons why we would consider using k-nearest neighbors instead of another model type:

Table of Contents

Simplicity

k-NN is straightforward and easy to understand. If you need an algorithm that you can easily explain to a non-technical boss or project/product manager, k-NN is a good starting point.

Non-parametric

Instead of making a bunch of assumptions about the data (e.g. linearity, conditional independence, etc.), you let the data speak for itself rather than fixing a distribution to the data. The only thing that you need to do is set the value for k, the number of neighbors that you want to consider when classifying a new, unseen instance. The only thing you assume is that neighboring instances are similar to each other.

Noisy Data

The k-NN algorithm is robust to data that contains a lot of noise.

No Training Needed

k-NN is a lazy algorithm in that there is no training step. The lack of a training step means that you get to classify new instances right from the start.

Flexibility

If you need an algorithm that can do both regression and classification, k-NN can do that for you.

Multi-class Problems

If you need to classify data in which there are multiple classes instead of just two classes, k-NN can handle that.

Continuous Improvement

In order to update k-NN, all you need to do is add the training instance. Since there is no training needed, k-NN does not need to build a completely new model for each training instance added. The more instances you have, the better k-NN can classify unseen instances.