The Essential k-NN Algorithm
Learn about the k-NN algorithm and its implementation using different modules.
We'll cover the following...
Overview
We can summarize the -NN algorithm as having the following steps:
- Create a list of all (distance, training sample) pairs.
- Sort these in ascending order.
- Pick to the first , which will be the k nearest neighbors.
- Chose the mode (the highest frequency) label for the nearest neighbors.
The implementation would look like this:
Press + to interact
class Measured(NamedTuple):distance: floatsample: TrainingKnownSampledef k_nn_1(k: int, dist: DistanceFunc, training_data: TrainingList,unknown: AnySample) -> str:distances = sorted(map(lambda t: Measured(dist(t, unknown), t), training_data) )k_nearest = distances[:k]k_frequencies: Counter[str] = collections.Counter(s.sample.sample.species for s in k_nearest)mode, fq = k_frequencies.most_common(1)[0]return mode
While clear, this does accumulate a large number of distance values in the distances list object, when only are actually needed. The sorted()
function consumes the source generator and creates a (potentially large) list of intermediate values.
One of the high-cost parts of this specific -NN algorithm is sorting the entire set of training data after the distances have been computed. We summarize the complexity with the description as an operation. A way to avoid cost is to avoid sorting the entire set of distance computations.
Steps ...