To find out the nearest neighbors, we want a metric to measure distance. The selection of distance metric impacts how similarity is evaluated in KNN. Under are 4 generally used distance measures:
Also called Taxicab or Metropolis Block Distance, Manhattan distance is calculated as:
the place p and q are two factors in an n-dimensional house.
It measures the space by summing absolutely the variations of their coordinates. It’s appropriate for grid-like knowledge constructions.
The commonest distance metric, Euclidean distance, is given by:
It represents the shortest (straight-line) distance between two factors in house. This metric works effectively when knowledge factors are distributed repeatedly.
Minkowski distance is a generalized metric that features each Manhattan and Euclidean distances:
For various values of r:
- r = 1 → Manhattan Distance
- r = 2 → Euclidean Distance
- r > 2 → Offers extra weight to bigger variations, impacting the clustering habits.
Selecting r appropriately can fine-tune the mannequin’s sensitivity to options.
Hamming distance is used when coping with categorical variables (e.g., binary classification, DNA sequences, textual content evaluation). It counts the variety of differing bits between two binary strings:
It’s notably helpful in textual content processing and bioinformatics purposes.