-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Stratified Group KFold implementation #18649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jnothman
merged 73 commits into
scikit-learn:main
from
marrodion:stratified-group-kfold
Mar 20, 2021
Merged
Changes from all commits
Commits
Show all changes
73 commits
Select commit
Hold shift + click to select a range
4df86c6
Initial implementation
hermidalc 6be3594
Forgot to add to second __add__ list
hermidalc 2f28673
Update split method parameter doc
hermidalc 2365735
Added example; changed default test_size to 0.1; added to author list
hermidalc b3d2b5a
Merge branch 'master' of github.com:scikit-learn/scikit-learn into st…
hermidalc aa8f288
StratifiedGroupKFold impl and other improvements
hermidalc 647a97e
Add class to __all__ spec
hermidalc 36babe5
Remove random_state when no shuffle
hermidalc 32e502a
Tighter formatting
hermidalc c7ad3f3
Merge branch 'master' into stratified-groupshufflesplit
4826d96
Update the implementation of StratifiedGroupKFold
13801a7
Add StratifiedGroupKFold to __init__
8367133
Add y checks to StartifiedGroupKFold
bca2dbc
Raise error if n_splits > max num samples in class
31fc183
Warn if n_splits > mn num samples in class
0d9a58f
Add SGKfold to general repr test
3c2c639
Add SGKFold to 2d_y test case
8648519
Add SGKfold to value erros test case
7005af2
Add SGKFold to StratifiedKFold test cases
6a52ae9
Add SGKFold to reproducibility test case
6f83a85
Add SGKFold to GroupKFold test case
7fbc736
Add SGKFold to nested cv test case
d4f99e6
Add SGKFold to random_state with shuffle=False test case
a38a872
Add SGKFold to constant splits test case
490b503
Fix repr test case
cc8da98
Fix formatting issues
6990a91
Add samples to a fold with least num samples
1f4da2b
Remove GroupShuffleSplit impl
9359cc7
Add notes to StratifiedGroupKFold
6386faa
Fix doctest
536c4c9
Added stratified group kfold tests
9681a61
Better variable naming
b7e4fc8
Add section to documentation
e3112b4
Merge branch 'master' into stratified-group-kfold
2580a81
Remove leftover StratifiedGroupShuffleSplit import
81be001
Merge remote-tracking branch 'upstream/master' into stratified-group-…
marrodion 113c06a
Add changelist and reference to original kernel
marrodion 72ebb9f
Better naming for least populated class check
marrodion 7093e70
Better expression for number of labels
marrodion 2b5e71c
Remove use of Counter
marrodion d36473e
Add tests for homogeneous groups
marrodion f65d873
Add StratifiedGroupKFold test against GroupKFold
marrodion 627fc9f
Add changes to changelist in docstring
marrodion 25fcb42
Add StratifiedGroupKFold to classes.rst
marrodion 57c53a5
Fix description of StratifiedGroupKFold
marrodion 484bf9d
Merge branch 'main' into stratified-group-kfold
marrodion 234f290
Move license notice out of docstring
marrodion 0eb6080
Disambiguate labels to classes in doc
marrodion 2e0ee20
Merge remote-tracking branch 'upstream/main' into stratified-group-kfold
marrodion f20718e
Merge remote-tracking branch 'upstream/main' into stratified-group-kfold
marrodion cf912af
Add changelog entry
marrodion 42e00ed
Fix changelog author entry
marrodion a1d0f9f
Fix StratifiedGroupKFold docstring
marrodion 8e3c852
Better variable names
marrodion 096b23b
Remove defaultdict in favor of numpy indexing
marrodion 0464839
Extracted best_fold search into a separate method
marrodion c0f907a
Make use of numpy broadcasting instead of for loop
marrodion 41036ad
Encode groups and use arrays instead of dicts
marrodion 93fcdd4
Use numpy sort instead of python
marrodion ae67ad3
Clarify shuffling behavior of StratifiedGroupKF in docs
marrodion 3940e23
Switch name from label_idx to class_idx
marrodion 9cff771
Merge remote-tracking branch 'upstream/main' into stratified-group-kfold
marrodion cccbff7
Remove accidentally leftover comment
marrodion 5cd8fdb
Fix np.sort keyword to support numpy < 1.15
marrodion 4fbc7a8
Merge remote-tracking branch 'upstream/main' into stratified-group-kfold
marrodion 024d3c6
Fix typo in docstring
marrodion 09cfc37
Add StratifiedGroupKFold to visualization doc
marrodion 6c8d5da
Merge remote-tracking branch 'upstream/main' into stratified-group-kfold
marrodion 60cb778
Add visualization for uneven group as an example
marrodion 29a0fe5
Fix image numbers to match updated example
marrodion 859d28a
Add author
marrodion 6fd5ec8
Add SGKF visualization to docs
marrodion 42a6b80
Add comments for groups in stratified CV tests
marrodion File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insert the visualisation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thank you.
Wasn't sure if needed, not every CV has a visualization in this documentation page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps not needed, but helpful. Happy for you to make the docs more consistent in another pr! ;)