In nonparametric methods such as the $$k$$-nearest neighbor algorithm, a distance function $$d$$ (or equivalently, a vector-valued basis function $$\phi(\mathbf{x})$$) must be specified to measure similarity between data points. This choice of distance metric is critical because it encodes the model's inductive bias. Even if any metric allows a model like $$1$$-nearest neighbor to achieve zero training error, different distance functions represent different underlying assumptions about the data patterns. Consequently, with finite data, these varying inductive biases will yield different predictors, and their generalization performance will depend on how compatible the chosen metric is with the true data distribution.

Claude

K-nearest neighbors is a non-parametric statistical method, which predicts based on the *k* nearest neighbors. k-NN classifiers are an example of what's called instance based or memory based supervised learning.

It can be applied to both classification and regression problems.
- When applied to classification problems, the prediction result would be the most common one among the *k* nearest training observations.
- When applied to regression problems, the prediction result would be the average of the *k* nearest training observations.

Some key notes about this method:
- It makes few assumptions about structure of data
- It gives potentially accurate but sometimes unstable predictions 
- It is sensitive to small changes in training data



K-nearest Neighbors (k-NN)

Given a finite training set, machine learning models must rely on certain assumptions to achieve human-level performance. These assumptions, known as inductive biases, encode preferences for solutions with specific properties that often reflect how humans think about the world. For example, a deep multilayer perceptron (MLP) has an inductive bias towards building up a complicated function through the composition of simpler functions. The necessity of these biases stems from the 'no free lunch' theorem, which dictates that algorithms must make assumptions to generalize effectively.

Inductive Bias in Machine Learning

Dive into Deep Learning

Statquest with Josh Starmer.(2017, June 26). Statquest: K-nearest neighbors, Clearly Explained [Video]. Youtube.  [www.youtube.com/watch?v=HVXime0nQeI](https://www.youtube.com/watch?v=HVXime0nQeI) 

Reference Video: K-Nearest Neighbors

Dhiraj, K. (2019, August 29). Difference between K-Nearest Neighbor(K-NN) and K-Means Clustering. Medium. [medium.com/@dhiraj8899/difference-between-k-nearest-neighbor-k-nn-and-k-means-clustering-d9a44859182f](https://medium.com/@dhiraj8899/difference-between-k-nearest-neighbor-k-nn-and-k-means-clustering-d9a44859182f) 

Medium: Difference between K-Means and KNN

This reference gets into mathmatical ways how KNN and K-Nearest Neighbors are different concepts

Umamaheswaran, Venkatesh. (2018, November 12). Comprehending K-means and KNN Algorithms. Medium. Retrieved From [becominghuman.ai/comprehending-k-means-and-knn-algorithms-c791be90883d](https://becominghuman.ai/comprehending-k-means-and-knn-algorithms-c791be90883d) 

Math/Python Explanation: Difference Between K-Means and KNN

A good source that talks about KNN, it's various uses, its algorithm, and examples.

Harrison, O. (2019, July 14). Machine Learning Basics with the K-Nearest Neighbors Algorithm. Retrieved March 12, 2020, from [towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761](https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761) 

Machine Learning Basics with KNN Algorithm

When using KNN for regression problems, the model is based off of ‘feature similarity’ which means that the output value of an observation you are trying to find is based on how closely it resembles the points in the training data set. With regression using KNN, this similarity is based on the average distance, typically using Euclidean distance, between the chosen observation and those around it.

KNN Regression

Singh, A. (2019, August 9). A Practical Introduction to K-Nearest Neighbor for Regression. Retrieved March 18, 2020, from https://www.analyticsvidhya.com/blog/2018/08/k-nearest-neighbor-introduction-regression-python/

A Practical Introduction to K-Nearest Neighbors Algorithm for Regression (Reference)

KNN can be useful in solving problems that have solutions that depend on identifying similar objects.
1. Recommender Systems
For example: Recommending products on Amazon, articles on Medium, movies on Netflix, or videos on YouTube.
2. Concept Search
This is used to search for semantically similar documents (i.e., documents containing similar topics). We could find it in many e-Discovery software packages, which are used for helping companies find all of the e-mails, contracts, etc.
3. Herta Security
They then use k-NN to identify a person by compare the face to their watchlist. 

KNN in practice

The UM online course: Applied Machine Learning in Python->Week 2-> K-Nearest Neighbors: Classification and Regression. The link is: https://www.online.umich.edu/courses/applied-machine-learning-in-python/



Reference video: K-Nearest Neighbors: Classification and Regression

Scikit-learn enables the creation of classifier objects. For example, 
```python
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5, weights = 'distance')
```
In this example, the classifier object (named knn) will consider 5 neighbors in its kneighbors queries.   Setting the weights parameter to 'distance' means that closer neighbors will be more influential than those farther away  (in terms of detemrining which category the "target" point belongs to).

sklearn.neighbors.KNeighborsClassifier

K-nearest neighbors can be used to classify data. Given a training set X_train with labels y_train, and given a new instance x_test to be classified:

- Find the k most similar instances (X_NN) to x_test that are in X_train.
- Get the labels y_NN for the instances in X_NN.
- Predict the label for x_test by combining the labels y_NN. The combination can be done through different methods, e.g., a simple majority vote can find the label that is the most common among the k most similar examples.

For example, if K = 1, then the first closest neighbor will be used to classify the unknown data point. If K = 14, then the 14 nearest neighbors will be used. If the data point falls between multiple clusters, then you choose the category that holds the most.

Classification Algorithm of K-Nearest Neighbors

Advantages
- It is easy and simple to implement
- Since it is non-parametric, you don’t have to make any assumptions about $f$
- A model does not need to be created
- It can be used for various statistical methods (classification, regression, etc.)

Disadvantages
- As the number of predictors increase, KNN can get very slow and at some point, not usable anymore.


K-Nearest Neighbors Advantages and Disadvantages

Used the data for classes 0, 1, and 2 plotted in the image.

What class would a KNeighborsClassifer classify the new point as for k = 1 and k = 3?

Which of the following is true for the nearest neighbor classifier?

The $$1$$-nearest neighbor algorithm is the simplest case of the $$k$$-nearest neighbor approach, where $$k=1$$. At training time, the learner memorizes the dataset, and at prediction time, it assigns a new data point the label of its single closest neighbor based on a chosen distance function $$d$$. Because it memorizes the data, the algorithm always achieves a training error of zero by perfectly interpolating the training data. Despite this, it can still generalize; under mild conditions, the $$1$$-nearest neighbor algorithm is a consistent estimator, meaning it eventually converges to the optimal predictor.

1-Nearest Neighbor Algorithm

Distance Metric Inductive Bias

The approach to training machine learning models typically consists of two distinct phases. The first phase focuses on fitting the model to the training data. The second phase involves estimating the generalization error—which is defined as the model's true error on the underlying population—by evaluating its performance on a separate holdout dataset.

Phases of Machine Learning Training

When classical regularization methods like weight decay improve generalization in deep networks without the use of early stopping, it is likely not because they restrict the network's capacity in a meaningful way. Instead, these techniques are thought to encode specific inductive biases that happen to align well with the structural patterns present in the datasets of interest, functioning similarly to how architectural choices or distance metrics guide model preferences.

Learn Before

Related