-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Closed
Labels
Description
Describe the bug
For the same data, BallTree.query
can return different nearest neighbour indices if these neighbours are equally distanced from a point depending on scikit-learn version. Specifically, this can be observed in 0.24.1 vs 1.1.1.
Steps/Code to Reproduce
import numpy as np
from sklearn.neighbors import BallTree
np.random.seed(61)
X = np.random.randint(0, 3, size=(10,2)) # generated dataset
tree = BallTree(X, 4, 'hamming')
distances, indices = tree.query(X, 3, return_distance=True, dualtree=False, breadth_first=False, sort_results=True)
for i, (point, distances, indices) in enumerate(zip(X, distances, indices)):
print(f'index {i}: datapoint {point} distances: {[round(dist,2) for dist in distances]} indices: {indices}')
Expected Results
scikit-learn versions 0.24.1 and 1.1.1 would be expected to consistently show the same nearest neighbour indices whenever multiple neighbours are equally distanced from a data point
Actual Results
scikit-learn v. 0.24.1:
scikit-learn v. 1.1.1:
Versions
System:
python: 3.9.7 (v3.9.7:1016ef3790, Aug 30 2021, 16:39:15) [Clang 6.0 (clang-600.0.57)]
executable: /Users/korigo/Downloads/scikit-0.24/bin/python3
machine: macOS-10.16-x86_64-i386-64bit
Python dependencies:
pip: 22.0.4
setuptools: 62.1.0
sklearn: 0.24.1
numpy: 1.22.4
scipy: 1.8.1
Cython: None
pandas: None
matplotlib: None
joblib: 1.1.0
threadpoolctl: 3.1.0
Built with OpenMP: True
System:
python: 3.9.7 (v3.9.7:1016ef3790, Aug 30 2021, 16:39:15) [Clang 6.0 (clang-600.0.57)]
executable: /Users/korigo/Downloads/scikit-1.1.1/bin/python3
machine: macOS-10.16-x86_64-i386-64bit
Python dependencies:
sklearn: 1.1.1
pip: 22.0.4
setuptools: 62.1.0
numpy: 1.22.4
scipy: 1.8.1
Cython: None
pandas: None
matplotlib: None
joblib: 1.1.0
threadpoolctl: 3.1.0
Built with OpenMP: True
kkozmic