1Cademy - Specifying K = the number of the nearest neighbors for a k-Nearest Neighbors algorithm

How it works Courses Research Communities Benefits About Us

Learn Before

Parameters to specify when training a k-Nearest Neighbors algorithm

Concept

Specifying K = the number of the nearest neighbors for a k-Nearest Neighbors algorithm

$k \ge 1$ .
There is no structured method to find the best value for “K”. We need to try to use training data and testing data to get accuracy.
A too small value for k (e.g. k=1 for 100 samples) may produce results that are too noisy and will have a higher influence on the result.
A too large value for k may produce categories with small number of samples in each. This is referred to as the curse of dimensionality where too many dimensions for the dataset may lead to a poor prediction of a given sample. Larger values of K will have smoother decision boundaries which mean lower variance but increased bias. Also, computationally expensive.
Another way to choose K is though cross-validation. One way to select the cross-validation dataset from the training dataset. Take the small portion from the training dataset and call it a validation dataset, and then use the same to evaluate different possible values of K. This way we are going to predict the label for every instance in the validation set using with K equals to 1, K equals to 2, K equals to 3. and then we look at what value of K gives us the best performance on the validation set and then we can take that value and use that as the final set of our algorithm so we are minimizing the validation error.
In general, practice, choosing the value of k is $k = \sqrt{N}$ where N stands for the number of samples in your training dataset.
The Elbow method for selecting optimal k.

0

2

Updated 2020-10-19

Contributors are:

Iman YeckehZaare

Iman YeckehZaare

chengyue qiu

Who are from:

University of Michigan - Ann Arbor

University of Michigan - Ann Arbor

References

Reference for KNN

Tags

Data Science

Related

Learn After