Skip to content

BallTree.query returns inconsistent indices between scikit-learn versions 0.24.1 and 1.1.1 #23667

@eeghor

Description

@eeghor

Describe the bug

For the same data, BallTree.query can return different nearest neighbour indices if these neighbours are equally distanced from a point depending on scikit-learn version. Specifically, this can be observed in 0.24.1 vs 1.1.1.

Steps/Code to Reproduce

import numpy as np
from sklearn.neighbors import BallTree

np.random.seed(61)

X = np.random.randint(0, 3, size=(10,2))  # generated dataset

tree = BallTree(X, 4, 'hamming')
distances, indices = tree.query(X, 3, return_distance=True, dualtree=False, breadth_first=False, sort_results=True)

for i, (point, distances, indices) in enumerate(zip(X, distances, indices)):
    print(f'index {i}: datapoint {point} distances: {[round(dist,2) for dist in distances]} indices: {indices}')

Expected Results

scikit-learn versions 0.24.1 and 1.1.1 would be expected to consistently show the same nearest neighbour indices whenever multiple neighbours are equally distanced from a data point

Actual Results

scikit-learn v. 0.24.1:

scikit-0 24 1

scikit-learn v. 1.1.1:

scikit-1 1 1

Versions

System:
  python: 3.9.7 (v3.9.7:1016ef3790, Aug 30 2021, 16:39:15) [Clang 6.0 (clang-600.0.57)]
executable: /Users/korigo/Downloads/scikit-0.24/bin/python3
  machine: macOS-10.16-x86_64-i386-64bit
Python dependencies:
     pip: 22.0.4
  setuptools: 62.1.0
   sklearn: 0.24.1
    numpy: 1.22.4
    scipy: 1.8.1
    Cython: None
    pandas: None
  matplotlib: None
    joblib: 1.1.0
threadpoolctl: 3.1.0

Built with OpenMP: True

System:
  python: 3.9.7 (v3.9.7:1016ef3790, Aug 30 2021, 16:39:15) [Clang 6.0 (clang-600.0.57)]
executable: /Users/korigo/Downloads/scikit-1.1.1/bin/python3
  machine: macOS-10.16-x86_64-i386-64bit
Python dependencies:
   sklearn: 1.1.1
     pip: 22.0.4
  setuptools: 62.1.0
    numpy: 1.22.4
    scipy: 1.8.1
    Cython: None
    pandas: None
  matplotlib: None
    joblib: 1.1.0
threadpoolctl: 3.1.0

Built with OpenMP: True

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions