Skip to content

Incorrect warning when clustering boolean data #18996

@peastman

Description

@peastman

Describe the bug

When clustering data with a metric that requires boolean data, the console fills up with a huge number of DataConversionWarning messages telling me the input data needs to be boolean, even though it already is.

Steps/Code to Reproduce

import numpy as np
from sklearn.cluster import OPTICS
x = np.random.randint(2, size=(10,5), dtype=np.bool)
labels = OPTICS(metric='rogerstanimoto').fit_predict(x)

x has dtype bool so this ought to be fine, but it prints many repetitions of the message

/Users/peastman/miniconda3/envs/tf2/lib/python3.7/site-packages/sklearn/metrics/pairwise.py:1765: DataConversionWarning: Data was converted to boolean for metric rogerstanimoto
warnings.warn(msg, DataConversionWarning)

When clustering larger datasets, this message can be repeated hundreds of thousands of times.

Versions

System:
    python: 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 22:45:16)  [Clang 9.0.1 ]
executable: /Users/peastman/miniconda3/envs/tf2/bin/python
   machine: Darwin-17.7.0-x86_64-i386-64bit

Python dependencies:
          pip: 20.2.4
   setuptools: 49.6.0.post20201009
      sklearn: 0.23.2
        numpy: 1.19.1
        scipy: 1.5.2
       Cython: 0.29.21
       pandas: 1.0.3
   matplotlib: 3.3.2
       joblib: 0.14.1
threadpoolctl: 2.1.0

Built with OpenMP: True

Metadata

Metadata

Assignees

Labels

BugModerateAnything that requires some knowledge of conventions and best practicesmodule:cluster

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions