Skip to content

DOC Fixes formating in feature_extraction module #19274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 27, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions sklearn/feature_extraction/_hash.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,9 @@ class FeatureHasher(TransformerMixin, BaseEstimator):
approximately conserve the inner product in the hashed space even for
small n_features. This approach is similar to sparse random projection.

.. versionchanged:: 0.19
``alternate_sign`` replaces the now deprecated ``non_negative``
parameter.
.. versionchanged:: 0.19
``alternate_sign`` replaces the now deprecated ``non_negative``
parameter.

Examples
--------
Expand Down
64 changes: 31 additions & 33 deletions sklearn/feature_extraction/text.py
Original file line number Diff line number Diff line change
Expand Up @@ -552,16 +552,16 @@ class HashingVectorizer(TransformerMixin, _VectorizerMixin, BaseEstimator):
Parameters
----------

input : string {'filename', 'file', 'content'}, default='content'
If 'filename', the sequence passed as an argument to fit is
expected to be a list of filenames that need reading to fetch
the raw content to analyze.
input : {'filename', 'file', 'content'}, default='content'
- If `'filename'`, the sequence passed as an argument to fit is
expected to be a list of filenames that need reading to fetch
the raw content to analyze.

If 'file', the sequence items must have a 'read' method (file-like
object) that is called to fetch the bytes in memory.
- If `'file'`, the sequence items must have a 'read' method (file-like
object) that is called to fetch the bytes in memory.

Otherwise the input is expected to be a sequence of items that
can be of type string or byte.
- If `'content'`, the input is expected to be a sequence of items that
can be of type string or byte.

encoding : string, default='utf-8'
If bytes or files are given to analyze, this encoding is used to
Expand Down Expand Up @@ -597,7 +597,7 @@ class HashingVectorizer(TransformerMixin, _VectorizerMixin, BaseEstimator):
preprocessing and n-grams generation steps.
Only applies if ``analyzer == 'word'``.

stop_words : string {'english'}, list, default=None
stop_words : {'english'}, list, default=None
If 'english', a built-in stop word list for English is used.
There are several known issues with 'english' and you should
consider an alternative (see :ref:`stop_words`).
Expand Down Expand Up @@ -633,10 +633,9 @@ class HashingVectorizer(TransformerMixin, _VectorizerMixin, BaseEstimator):
out of the raw, unprocessed input.

.. versionchanged:: 0.21

Since v0.21, if ``input`` is ``filename`` or ``file``, the data is
first read from the file and then passed to the given callable
analyzer.
Since v0.21, if ``input`` is ``'filename'`` or ``'file'``, the data
is first read from the file and then passed to the given callable
analyzer.

n_features : int, default=(2 ** 20)
The number of features (columns) in the output matrices. Small numbers
Expand Down Expand Up @@ -819,16 +818,16 @@ class CountVectorizer(_VectorizerMixin, BaseEstimator):

Parameters
----------
input : string {'filename', 'file', 'content'}, default='content'
If 'filename', the sequence passed as an argument to fit is
expected to be a list of filenames that need reading to fetch
the raw content to analyze.
input : {'filename', 'file', 'content'}, default='content'
- If `'filename'`, the sequence passed as an argument to fit is
expected to be a list of filenames that need reading to fetch
the raw content to analyze.

If 'file', the sequence items must have a 'read' method (file-like
object) that is called to fetch the bytes in memory.
- If `'file'`, the sequence items must have a 'read' method (file-like
object) that is called to fetch the bytes in memory.

Otherwise the input is expected to be a sequence of items that
can be of type string or byte.
- If `'content'`, the input is expected to be a sequence of items that
can be of type string or byte.

encoding : string, default='utf-8'
If bytes or files are given to analyze, this encoding is used to
Expand Down Expand Up @@ -864,7 +863,7 @@ class CountVectorizer(_VectorizerMixin, BaseEstimator):
preprocessing and n-grams generation steps.
Only applies if ``analyzer == 'word'``.

stop_words : string {'english'}, list, default=None
stop_words : {'english'}, list, default=None
If 'english', a built-in stop word list for English is used.
There are several known issues with 'english' and you should
consider an alternative (see :ref:`stop_words`).
Expand Down Expand Up @@ -1532,15 +1531,15 @@ class TfidfVectorizer(CountVectorizer):
Parameters
----------
input : {'filename', 'file', 'content'}, default='content'
If 'filename', the sequence passed as an argument to fit is
expected to be a list of filenames that need reading to fetch
the raw content to analyze.
- If `'filename'`, the sequence passed as an argument to fit is
expected to be a list of filenames that need reading to fetch
the raw content to analyze.

If 'file', the sequence items must have a 'read' method (file-like
object) that is called to fetch the bytes in memory.
- If `'file'`, the sequence items must have a 'read' method (file-like
object) that is called to fetch the bytes in memory.

Otherwise the input is expected to be a sequence of items that
can be of type string or byte.
- If `'content'`, the input is expected to be a sequence of items that
can be of type string or byte.

encoding : str, default='utf-8'
If bytes or files are given to analyze, this encoding is used to
Expand Down Expand Up @@ -1585,10 +1584,9 @@ class TfidfVectorizer(CountVectorizer):
out of the raw, unprocessed input.

.. versionchanged:: 0.21

Since v0.21, if ``input`` is ``filename`` or ``file``, the data is
first read from the file and then passed to the given callable
analyzer.
Since v0.21, if ``input`` is ``'filename'`` or ``'file'``, the data
is first read from the file and then passed to the given callable
analyzer.

stop_words : {'english'}, list, default=None
If a string, it is passed to _check_stop_list and the appropriate stop
Expand Down