You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is common to build an sklearn pipeline, which includes the necessary data preprocessing (and feature encoding) steps and ends with an estimator (For example, see Column Transformer with Mixed Types). The so build pipeline can then be used as a normal classifier, where the fit(X) method also fits the correspondig data transformers and transforms the data.
However, in batch.py::select_instance(), the (dis)similiraty between the training data and the instance pool is computed directly, without any data transformation
This is not optimal, as any feature engineering & transformations are ignored. Furthermore, it completely fails if one is using a pandas dataframe to hold the data set.