Contents

This page is about a new k-nearest neighbor implementation for Weka. Our implementation extends the current Weka implementation by adding an example weighting function, wi=1/distance(ei,eq)^2, where distance(ei, eq) is the distance between the current example (ei) and the query example (eq). In addition, our implementation provides a distance function know as Heterogeneous Euclidean-VDM Metric (HVDM), that aims to better incorporate the information provided by nominal attributes.

This page algo includes some detailed experimental results regarding the influence of k-nearest neighbor parameters over its performance. This results are part of a paper published in ASAI-2009.

k-Nearest Neighbor Implementation

Our implementation can be downloaded as a single java source file or as part of a complete Weka jar file. The second option might be easier to use since it does not require to recompile the whole Weka package. The Weka package is version 3.14.3.

Detailed Experimental Results

The detailed experimental results are accuracies obtained for each data set. Each accuracy is a mean value calculated over 100 test sets. We used 10 x 10-fold cross-validation to partition each data set into training and test sets. All reported results are separated by distance function (HEOM (Heterogeneous Euclidean-Overlap Metric), HVDM and HMOM (Heterogeneous Manhattan-Overlap Metric)) and weighting function (none (plain kNN), inverse (1/distance(ei,eq)^2), and similarity (1-distance(ei,eq))). Tabs at the bottom of the page allow to change the value of the k parameter (1, 3, 5, 7, 9, 11, 15, 21 and 27). There are also mean results for all k values.

Citation

If you use this implementation or part of the results published in our paper, please, use the following citation:

BATISTA, G. E. A. P. A., Silva, D. F. How k-Nearest Neighbor Parameters Affect its Performance, Proceedings of the Argentine Symposium on Artificial Intelligence, 2009. p.1 - 12.

@inproceedings{Batista:ASAI2009,
  author    = {Gustavo E. A. P. A. Batista and Diego Furtado Silva},
  title     = {How k-Nearest Neighbor Parameters Affect its Performance},
  booktitle = {Argentine Symposium on Artificial Intelligence},
  year      = {2009},
  pages     = {1--12},
}