Restructure Keras Scikit-Learn wrappers to better implement Scikit-Learn API #37201

adriangb · 2020-03-01T07:32:57Z

This is a modification of #32533. I am opening a new PR because that one seems stalled and I made a lot of changes/improvements (but keeping the same idea).

A quick summary:
The existing scikit-learn wrappers for Keras models are not compatible with many scikit-learn functions. Additionally, they require that dataset dimensions be determined before calling the fit method, which is unlike the scikit-learn estimators and makes it hard to build dynamically adaptable pipelines.

What this PR does:
This PR does not change any API.
By moving the storage of parameters from self.sk_params to self.__dict__, compatibility with a lot of the scikit-learn functionalities are improved. Additionally, I gave the model building function the ability to request the data that will be fitted (to determine dimensions) as well as any other attributes of the wrapper instance. Finally, I enabled copying/pickling of wrapped models as well as the ability to wrap instances of Model, which should allow for greater flexibility in incorporating into an exsiting Keras workflow.

How is it tested:
All existing tests are working and were unchanged. This confirms that there were no API changes. New tests were added for all of the new functionality as well as some of the most common scikit-learn operations that were previously broken.
I would like to credit @daviddiazvico , the author of the original PR: I borrowed a lot of the tests he had written as well as the original idea for fixing these issues. I will make him a co-author of the final commit if this gets approved.

daviddiazvico · 2020-03-01T08:15:18Z

Thanks for your effort @adriangb ! I'm sorry I couldn't write back before.

Of course, I don't have any problem with you updating the other PR, but I'm not sure if I have to do something to allow you to edit it. I've added you as a collaborator to my tf fork.

In any case, this is a nice improvement for tf.keras in my opinion, but I don't know if the development team is interested. My PR has been open for a while and the same changes were proposed in the keras repo about a year ago now.

adriangb · 2020-03-01T14:41:38Z

I'm hoping that maybe the reviewers were just a bit busy?

I feel like the wrappers are used quite often, especially in beginner tutorials. I think it's important that the initial experience be seamless.

In addition to your PR, this resolves several issues: #33204, #36074, #34689 and #36137. There are/will be more, like the comment I posted on your PR.

I'm hoping we can at least get some feedback from @fchollet or @pavithrasv regarding interest in these changes.

tensorflow/python/keras/wrappers/scikit_learn.py

adriangb · 2020-03-05T19:38:43Z

I was able to add built-in support for all of the multi-output modes that Scikit-Learn supports, as well as a framework to easily support multi-input models. This means this would close #34689 as well.

Because of the great modularity of the Functional API, only a limited number of multi-output cases can be automatically supported (those that scikit-learn itself supports with 1-1 mapping of model outputs to y columns) and no multi-input models can be supported out of the box. If a user wants to use a more complex model with the wrappers, they would have to subclass and manually define the desired behavior. As I show in the tests, all that is needed to implement multi-input is to define a behvior for _pre_process_X (for example, columns 0-3 are one input, 4 is another input). All of this was done with no changes to the API.

Although the original problem statement ('fix compatibility') was quite large, I do feel that this PR has grown very large. I would personally prefer to split it into smaller PRs (even if that means more work for me), but I will leave that up to the reviewers.

@gbaned , is there a timeline for this review process? It would be nice to at least get tests running so that I can see if there are issues.

adriangb · 2020-03-07T06:10:33Z

I realized that we actually need to re-implement not only an R^2 score, but also a classifier accuracy score: Keras does not use the sample_weight parameter for metrics, but Scikit-Learn does use it for scoring. That and the fact that Keras metrics don't support the Scikit-Learn style multi-output concept, so all of those scores are already being handled manually.

This prompted me to think: does it make sense to make scikit-learn an optional dependency that is only imported within this module? I can see pros and cons, just playing devil's advocate here.

k-w-w

Approving to run the presubmits.

Can you add tests to make sure that saving and loading works with the wrapper?

adriangb · 2020-03-09T21:06:44Z

Approving to run the presubmits.

Can you add tests to make sure that saving and loading works with the wrapper?

Thank you for kicking off those tests!

There are several tests for pickling/unpickling of Functional API models with and without Callbacks, etc. I think the only thing that is missing is a test for subclassed models. I'll add that in the next few days.

adriangb · 2020-03-10T01:20:32Z

Quite a few errors:

Pylint: my fault, I was using 4 spaces. Fixing.
Scipy import error: for some CI builds, the scikit-learn version was pinned to a 2016 version that now has some broken functionality that these new changes were relying on. I bumped the version.
Python 2 errors: I'm ignoring these.
API changes: it's complaining about changes to internal parameters that should never have been public in the first place. I now appended an _ to them to fix this for the future. It also doesn't like that I capitalized X to match scikit-learn.
Windows builds: I have no idea why these are failing. Any advice would be much appreciated.

Also, I added the test suggested by @k-w-w (it's called SerializeCustomLayers fyi).

I guess another approval is needed for tests to run again, it'd be nice to fix the windows build errors before that though.

k-w-w · 2020-03-10T03:09:46Z

Thanks for fixing the bugs! I'm pretty sure the windows tests are unrelated. Running the tests again

tensorflow/python/keras/wrappers/scikit_learn.py

fchollet

Hello Adrian,

Thank you for the PR. In general, the scikit-learn API wrapper for Keras is not well-maintained at this time. Because we don't have the resources to maintain it (or even to review a very involved PR like yours), we are considering deprecating it.

Since you are a user of this feature and you've already spent a lot of time developing improvements, I would recommend starting a new repository & pip package hosting an up-to-date version of the API wrapper (effectively, do a fork). We could then redirect users of tf.keras.wrappers.scikit_learn to your repository & pip package, while we deprecate this functionality in tf.keras.

What do you think?

fchollet · 2020-04-10T20:29:39Z

tensorflow/python/keras/wrappers/scikit_learn.py

+      Sequential.evaluate,
+      Sequential.fit,
+      Sequential.predict,
+      Sequential.predict_classes,


predict_classses is now deprecated.

adriangb · 2020-04-10T20:59:15Z

@fchollet, thank you for looking at this!

I think that could work really well. But I'd like to list out pros and cons to consider

Pros:

More flexibility. scikit-learn can become required for the wrapper, which eliminates a lot of code duplication for this wrapper and removes the need for scikit-learn from TF CI.
Relieves maintenance burden from Keras/TF team.

Cons:

Improvements in Keras might take a long time to trickle over.
When issues do crop up in the future, wrapper maintainers (i.e. me) might have to open an issue with Keras/TF to get support in fixing things.

Overall, I think the pros outweigh the cons.

I will start working on getting a separate repo with CI/publishing working. In the meantime, if you could do a brief review of this PR as it currently stands, that would be super useful to make sure the initial release is as good as possible.

…sorflow/tensorflow#37201)

…sorflow/tensorflow#37201) Co-authored-by: David Díaz Vico <david.diaz.vico@outlook.com>

adriangb · 2020-04-12T05:16:57Z

Are there any preferences as far as:

Naming of the repo/package or any project this should live under. I came up with sklearn-keras-wrap
Documentation style or content. I planned on copying the existing docs but just keeping everying in the README.md for simplicity.
Python versions: I think it would be easier to make this package >=Python3.5
Linting: I would switch to flake8/black
Testing: I would switch to pytest.

I played around with a packaging a bit, everything seems to work as far as CI/testing/releasing. It would have scikit-learn>=0.21.0 and tensorflow>=2.1.0 as dependencies.

…sorflow/tensorflow#37201) Co-authored-by: David Díaz Vico <david.diaz.vico@outlook.com>

adriangb · 2020-05-19T09:30:27Z

A quick update: the package is now fully operational. I settled on the name SciKeras.
PyPi: https://pypi.org/project/scikeras/
GitHub: https://github.com/adriangb/scikeras

Some important updates since this PR:

Inherited BaseWrapper from sklearn.base.BaseEstimator.
Implemented tags interface.
Use OneHotEncoder/LabelEncoder instead of manual numpy work.
Fix model serialization bugs.
A lot of cleanup.

With all of this, estimators created with these wrappers now pass all of scikit-learn's estimator checks, except those that require setting a random state. As far as I understand, it is not possible to easily set a random seed in tf.

adriangb · 2020-07-06T21:37:38Z

Hi @gbaned, just checking if there are any updates on this proposal/PR? Thanks!

gbaned · 2020-07-09T13:42:13Z

Hi @gbaned, just checking if there are any updates on this proposal/PR? Thanks!

Hi @adriangb, It is waiting for approval. Thanks!

fchollet · 2020-07-15T20:42:28Z

Thanks for the update. It's great to see that you've already released the new package. We can recommend that people start using it instead of keras.wrappers.scikit_learn.

With all of this, estimators created with these wrappers now pass all of scikit-learn's estimator checks, except those that require setting a random state. As far as I understand, it is not possible to easily set a random seed in tf.

This should be fixable: https://www.tensorflow.org/api_docs/python/tf/random/set_seed

What do you want us to do with the current PR? Should we close it?

adriangb · 2020-07-15T21:56:11Z

We can recommend that people start using it instead of keras.wrappers.scikit_learn.

That sounds good.

This should be fixable: https://www.tensorflow.org/api_docs/python/tf/random/set_seed

Will take a look, thank you.

What do you want us to do with the current PR? Should we close it?

I think let's keep it open for a bit longer. Dask is looking to adopt SciKeras as a wrapper (here), so as they do their testing I expect there to be a couple of issues that crop up in the next couple of weeks that I may need input from the TF team on. Unless the TF team is willing to check the SciKeras repo if they are tagged.

fchollet · 2020-07-16T05:07:08Z

I think let's keep it open for a bit longer. Dask is looking to adopt SciKeras as a wrapper (here), so as they do their testing I expect there to be a couple of issues that crop up in the next couple of weeks that I may need input from the TF team on. Unless the TF team is willing to check the SciKeras repo if they are tagged.

Ok, sounds good! Please reach out if you need anything from us (over email preferably, so we don't miss it). We'll start recommending your library as soon as it starts getting traction then 👍

gbaned · 2020-07-29T14:03:28Z

@adriangb Any update on this PR? Please. Thanks!

gbaned · 2020-08-06T14:46:04Z

@adriangb Any update on this PR? Please. Thanks!

adriangb · 2020-08-06T15:22:44Z

Hi @gbaned, as per François' comment above, the plan is to not merge this PR and instead move this part of tf.keras to an external package. I had asked to keep this PR open for communication and help with bringup of the external package, but I've since established communication with François directly and the external package is making good progress, so I think we can close this PR 😄

tensorflow-bot bot added the size:XL CL Change Size:Extra Large label Mar 1, 2020

googlebot added the cla: yes label Mar 1, 2020

adriangb changed the title ~~Restructure Scikit-Learn wrappers to better implement Scikit-Learn API~~ Restructure Keras Scikit-Learn wrappers to better implement Scikit-Learn API Mar 1, 2020

gbaned self-assigned this Mar 2, 2020

gbaned added the comp:keras Keras related issues label Mar 2, 2020

gbaned requested a review from fchollet March 2, 2020 04:16

adriangb commented Mar 2, 2020

View reviewed changes

tensorflow/python/keras/wrappers/scikit_learn.py Outdated Show resolved Hide resolved

adriangb commented Mar 3, 2020

View reviewed changes

tensorflow/python/keras/wrappers/scikit_learn.py Outdated Show resolved Hide resolved

gbaned added the awaiting review Pull request awaiting review label Mar 5, 2020

tensorflowbutler removed the awaiting review Pull request awaiting review label Mar 9, 2020

k-w-w previously approved these changes Mar 9, 2020

View reviewed changes

tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 9, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 9, 2020

adriangb dismissed k-w-w’s stale review via 393547a March 10, 2020 01:21

tensorflow-bot bot removed the ready to pull PR ready for merge process label Mar 10, 2020

k-w-w added the kokoro:force-run Tests on submitted change label Mar 10, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 10, 2020

gbaned requested a review from k-w-w March 16, 2020 12:06

k-w-w reviewed Mar 17, 2020

View reviewed changes

tensorflow/python/keras/wrappers/scikit_learn.py Outdated Show resolved Hide resolved

adriangb force-pushed the wrapper-devel branch from 02bbffa to e06bdfa Compare March 17, 2020 16:28

k-w-w added the kokoro:force-run Tests on submitted change label Mar 17, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 17, 2020

fchollet reviewed Apr 10, 2020

View reviewed changes

adriangb added a commit to adriangb/scikeras that referenced this pull request Apr 11, 2020

Refactor wrappers to better support scikit-learn APIs (originally ten…

830640c

…sorflow/tensorflow#37201)

adriangb added a commit to adriangb/scikeras that referenced this pull request Apr 11, 2020

Refactor wrappers to better support scikit-learn APIs (originally ten…

e87e0e0

…sorflow/tensorflow#37201)

adriangb added a commit to adriangb/scikeras that referenced this pull request Apr 11, 2020

Refactor wrappers to better support scikit-learn APIs (originally ten…

f808fd1

…sorflow/tensorflow#37201) Co-authored-by: David Díaz Vico <david.diaz.vico@outlook.com>

tensorflowbutler removed the awaiting review Pull request awaiting review label Apr 12, 2020

gbaned added the awaiting review Pull request awaiting review label Apr 13, 2020

gbaned requested a review from fchollet April 24, 2020 08:00

adriangb added a commit to adriangb/scikeras that referenced this pull request May 19, 2020

Refactor wrappers to better support scikit-learn APIs (originally ten…

9b32eba

…sorflow/tensorflow#37201) Co-authored-by: David Díaz Vico <david.diaz.vico@outlook.com>

adriangb added a commit to adriangb/scikeras that referenced this pull request May 19, 2020

Refactor wrappers to better support scikit-learn APIs (originally ten…

e0fc3cd

…sorflow/tensorflow#37201) Co-authored-by: David Díaz Vico <david.diaz.vico@outlook.com>

gbaned removed the awaiting review Pull request awaiting review label Jul 17, 2020

fchollet mentioned this pull request Jul 20, 2020

Improved compatibility with sklearn #32533

Closed

gbaned added the stat:awaiting response Status - Awaiting response from author label Jul 24, 2020

adriangb closed this Aug 6, 2020

adriangb mentioned this pull request Nov 25, 2020

DOC: Add a doc page detailing advantages (and disadvantages?) vs. TF skelarn wrappers adriangb/scikeras#132

Closed

gowthamkpr mentioned this pull request Sep 29, 2022

Improving compatibility with Scikit-learn #32532

Closed

Restructure Keras Scikit-Learn wrappers to better implement Scikit-Learn API #37201

Restructure Keras Scikit-Learn wrappers to better implement Scikit-Learn API #37201

Uh oh!

Conversation

adriangb commented Mar 1, 2020

Uh oh!

daviddiazvico commented Mar 1, 2020

Uh oh!

adriangb commented Mar 1, 2020

Uh oh!

Uh oh!

Uh oh!

adriangb commented Mar 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adriangb commented Mar 7, 2020

Uh oh!

k-w-w left a comment

Choose a reason for hiding this comment

Uh oh!

adriangb commented Mar 9, 2020

Uh oh!

adriangb commented Mar 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k-w-w commented Mar 10, 2020

Uh oh!

Uh oh!

fchollet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fchollet Apr 10, 2020

Choose a reason for hiding this comment

Uh oh!

adriangb commented Apr 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adriangb commented Apr 12, 2020

Uh oh!

adriangb commented May 19, 2020

Uh oh!

adriangb commented Jul 6, 2020

Uh oh!

gbaned commented Jul 9, 2020

Uh oh!

fchollet commented Jul 15, 2020

Uh oh!

adriangb commented Jul 15, 2020

Uh oh!

fchollet commented Jul 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gbaned commented Jul 29, 2020

Uh oh!

gbaned commented Aug 6, 2020

Uh oh!

adriangb commented Aug 6, 2020

Uh oh!

Uh oh!

adriangb commented Mar 5, 2020 •

edited

Loading

adriangb commented Mar 10, 2020 •

edited

Loading

fchollet left a comment •

edited

Loading

adriangb commented Apr 10, 2020 •

edited

Loading

fchollet commented Jul 16, 2020 •

edited

Loading