-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
MultiNorm class #29876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
MultiNorm class #29876
Conversation
lib/matplotlib/colors.py
Outdated
in the case where an invalid string is used. This cannot use | ||
`_api.check_getitem()`, because the norm keyword accepts arguments | ||
other than strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused because this function is only called for isinstance(norm, str)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this function exists because the norm keyword accepts Normalize
objects in addition to strings.
This is fundamentally the same error you get if you give an invalid norm to a Colorizer
object.
In main, the @norm.setter
on colorizer.Colorizer
reads:
@norm.setter
def norm(self, norm):
_api.check_isinstance((colors.Normalize, str, None), norm=norm)
if norm is None:
norm = colors.Normalize()
elif isinstance(norm, str):
try:
scale_cls = scale._scale_mapping[norm]
except KeyError:
raise ValueError(
"Invalid norm str name; the following values are "
f"supported: {', '.join(scale._scale_mapping)}"
) from None
norm = _auto_norm_from_scale(scale_cls)()
...
The _get_scale_cls_from_str()
exists in this PR because this functionality is now needed by both colorizer.Colorizer.norm()
and colors.MultiNorm
.
Note this PR does not include changes to colorizer.Colorizer.norm()
so that it makes use of _get_scale_cls_from_str()
. These changes follow in the next PR: #29877 .
55b85e3
to
f42d65b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor points, plus re-pinging @timhoffm in case he has an opinion re: n_input / input_dims naming?
See #29876 (comment) |
Thank you @timhoffm |
vmin, vmax : float, None, or list of float or None | ||
Limits of the constituent norms. | ||
If a list, each value is assigned to each of the constituent | ||
norms. Single values are repeated to form a list of appropriate size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is broadcasting reasonable here? I would assume that most MultiNorms have different scales and thus need per-element entries anyway. It could also be an oversight to pass a single value instead of multiple values.
I'm therefore tempted to not allow scalars here but require exactly n_variables values. A more narrow and explicit interface may be the better start. We can always later expand the API to broadcast scalars if we see that's a typical case and reasonable in terms of usability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timhoffm Perhaps this is also a topic for the weekly meeting :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm perfectly fine with removing this here, and perhaps that is a good starting point.
My entry into this topic was a use case (dark-field X-ray microscopy, DFXRM) where we typically want vmax0 = vmax1 = -vmin0 =-vmin1
, i.e. equal normalizations, and centered on zero, and given that entry point it felt natural to me to include broadcasting.
6b86d63
to
32247f5
Compare
This is on hold until we sort out #30149 (Norm Protocol) |
I have rebased this PR and updated the MultiNorm to inherit from the Norm ABC now that #30178 has been merged :) |
47b5116
to
63feb6b
Compare
assert norm.vmax[0] == 2 | ||
assert norm.vmax[1] == 2 | ||
|
||
# test call with clip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need more extensive testing for call with multiple values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we need some more tests.
Should we make them here, or test via the top level plotting functions once they are implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do the tests here because you can directly test the numeric result of the norm function. With plotting functions, you'll get colors as the result, which are much harder to test / have less expressiveness.
Here you can do a lot of cases as one-liners.
@timhoffm Thank you for taking the time to give detailed comments! |
@timhoffm You made a comment here #29876 (comment):
The variable has since changed name to The current status is the public property: @property
def n_components(self):
"""Number of norms held by this `MultiNorm`."""
return len(self._norms) I think we actually have 3 options:
This impacts how we communicate with the user via docstrings, i.e. in Here we could have:
i.e. we can choose to make Once the top level plotting functions are made, there will be a check if the data pipeline is consistent, i.e. On the other hand, while I am quite certain that mulitvariate plotting functionality will be used by a number of users, it is unclear to me if any/how many will build advanced functionality that requires access to @timhoffm Let me know if I should remove |
ab61439
to
910b343
Compare
I think making it public is the right way to go. While it slightly complicates the mental model for all the current 1d norms, it is by design that it’s now only the special case and hence it is slightly more complicated. |
910b343
to
dbeca30
Compare
lib/matplotlib/colors.py
Outdated
""" | ||
Parameters | ||
---------- | ||
norms : list of (str, `Normalize` or None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we accept None
here? I don't see an advantage of [None, None]
over ["linear", "linear"]
. Quite the opposite, the first one is less readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in case folks want to update the list later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't make sense to me. (1) None
is resolved to Normalize
immediately. (2) Not sure why one wanted to lated update but that's possible either way.
I rather suspect that it's because it along the lines of: In 1D some other places - e.g. ScalarMappable.set_norm
- accept None. But likely the othe places only accept it intentionally or unintentionally because norm=None is somewhere a default kwarg and that is passed through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rather suspect that it's because it along the lines of: In 1D some other places - e.g. ScalarMappable.set_norm - accept None. But likely the othe places only accept it intentionally or unintentionally because norm=None is somewhere a default kwarg and that is passed through.
Exactly this.
My thinking was that a common use case is not to supply a norm:
ax.imshow((A, B), cmap='BiOrangeBlue')
In this case the norm
keyword (default: None
) is singular, but to be compatible with the data/cmap it needs to have a length of 2, so in this case the norm keyword is repeated and becomes [None, None]
, and this then gets passed to the MultiNorm
constructor.
In any case, it is better if, as you say we only accept strings.
I will make the change later.
see colorizer.py → _ensure_norm(norm, n_variates=n_variates)
here for the proposed implementation. (This can also be simplified if we no longer accept None
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't make sense to me. (1) None is resolved to Normalize immediately. (2) Not sure why one wanted to lated update but that's possible either way.
What happens with ['linear', None]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be parsed to ['linear', 'linear']
, but will now (with the update) produce an error.
If we find we need this functionality, we can always bring it back in a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how do you denote ['linear', mpl.NoNorm()]
? is that ['linear', 'none']
?
23777b3
to
3a4f6b9
Compare
Thank you @timhoffm , the changes should now be in :) |
lib/matplotlib/colors.py
Outdated
Data to normalize, as tuple, scalar array or structured array. | ||
|
||
- If tuple, must be of length `n_components` | ||
- If scalar array, the first axis must be of length `n_components` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we have a discussion on the axis order? I would have expected it the other way round. Note that we represent RGB arrays as (N, 3), where in is the color count and 3 are the r, g, b values. The call here should be comparable, i.e. the components are the second dimension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anntzer had the same question above (#29876 (comment)) I will repeat my answer (#29876 (comment)) here:
We had a brief discussion on this here #14168 (comment), but we followed that up on a weekly meeting 14 September 2023 https://hackmd.io/@matplotlib/Skaz1lrh6#Mutlivariate-colormapping .
The argument is not formulated well in the log, but basically
(V, N, M)
because there areV
cmaps and norms, and in this way the input for the data mirrors that of the vmin, vmax and norm keywords, i.e.:ax.imshow((data0, data1), vmin=(vmin0, vmin1), cmap='BiOrangeBlue')
orax.imshow((preassure, temperature), vmin=(vmin_preassure, vmin_temperature), cmap='BiOrangeBlue')Quite often the data is qualitatively different, like the example at the bottom of the page here: https://trygvrad.github.io/mpl_docs/Multivariate%20colormaps.html where
GDP_per_capita
andAnnual_growth
are plotted together. I find it is not very intuitive to wrap these in a single array.(also: this allows the different variates to have different data types)
While I see the argument that this has similarities to RGB images, I believe it is also qualitatively quite different, because this works through the norm→colormap pipeline while RGB images bypasses that machinery by design.
The way I see it is that the three values of a pixel in an RGB image (in the typical case, i.e. a drawing/diagram or picture taken with a camera) work together to encode one piece of information (color), while if you plot GDP_per_capita
and Annual_growth
together using separate norms and a 2D colormap, you have two pieces of information at each point, even though those two pieces are represented by a single color via a colormap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for repeating the answer. The topic is so large that it’s difficult to keep an overview of the state of the discussion.
Your arguments are valid, but only cover the case of list/tuple of 1d array-like, and there I agree. However, I argue that 2d arrays are a separate topic and should be handled differently, because there datasets is typically structured in columns. See e.g. the heights parameter in https://matplotlib.org/devdocs/api/_as_gen/matplotlib.axes.Axes.grouped_bar.html#matplotlib.axes.Axes.grouped_bar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha!
Just to make sure, you are arguing that a tuple/list of 2D arrays should be handled differently than a 3D array?
I have no problems with this, if you think it is OK from a syntax point of view :)
In practice this would change only
"""
- If scalar array, the first axis must be of length `n_components`
"""
to:
"""
- If scalar array, the last axis must be of length `n_components`
"""
And in the implementation, this would add an else:
to MultiNorm._iterable_components_in_data()
:
@staticmethod
def _iterable_components_in_data(data, n_components):
"""
Provides an iterable over the components contained in the data.
An input array with `n_components` fields is returned as a list of length n
referencing slices of the original array.
Parameters
----------
data : np.ndarray, tuple or list
The input data, as tuple, scalar array or structured array.
- If tuple, must be of length `n_components`
- If scalar array, the last axis must be of length `n_components`
- If structured array, must have `n_components` fields.
Returns
-------
tuple of np.ndarray
"""
if isinstance(data, np.ndarray):
if data.dtype.fields is not None:
data = tuple(data[descriptor[0]] for descriptor in data.dtype.descr)
else:
data = tuple(data[..., i] for i in range(data.shape[-1]))
if len(data) != n_components:
raise ValueError("The input to this `MultiNorm` must be of shape "
f"({n_components}, ...), or be structured array or scalar "
f"with {n_components} fields.")
return data
There will also be some implications when we check consistency between the data, colormap and norm in colorizer, but that is a problem for a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is my idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the MultiNorm so that only tuples, lists, or structured arrays are acceptable as input to call, inverse, etc.
The details are in MultiNorm._iterable_components_in_data(data, n_components)
I made an attempt to have error messages that steer the user in the right direction.
multi_norm = mpl.colors.MultiNorm(['linear', 'linear'])
multi_norm(np.zeros((3, 2)))
gives the error:
ValueError: The input to this
MultiNorm
must be a list or tuple of length 2, or be structured array with 2 fields. You can userfn.unstructured_to_structured(data)
available withfrom numpy.lib import recfunctions as rfn
to convert the input array of shape (3, 2) to a structured format
while
multi_norm(np.zeros((3, 2)))
gives:
The input to this
MultiNorm
must be a list or tuple of length 2, or be structured array with 2 fields. You can uselist(data)
to convert the input data of shape (2, 3) to a compatible list
Does this make sense to you @timhoffm?
I would really like to point users in the right direction, but I have limited experience writing errors of this kind :)
(PS: we may want to revisit the wording here in a later PR, because I suspect this error message will come up most commonly form the top level plotting functions, and in that case I would prefer a error message of the kind "Using axes.imshow(...)
with a bivariate or multivariate colormap requires the input data..." rather than mentioning MultiNorm
by name, but time will tell how easily that can be achieved, and if it is worth it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue structured arrays are rarely used and that most people who need "structured arrays" use xarray
. I'd remove the last sentence of the first error. If someone doesn't already have the data as a structured array, I doubt they really want to make it one for plotting (versus just making a list or tuple). If you want a second sentence, then I'd show them how to make a list data_as_list = [data[:, i] for i in range(data.shape[1])]
.
The second error should use the same name: data_as_list = list(data)
. However, I do think that we should just let that pass silently, even if we don't document such arrays as being a legitimate argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jklymak that structured arrays are too special so that we should not recommend them if people are not using them already. - They should be mentioned in the docstring but not in a suggestion in an error message. If the array is 2D list(data.T)
is the simplest way, and [data[..., i] for i in range(data.shape[-1])]
for more than 2D.
I do think that we should just let that pass silently, even if we don't document such arrays as being a legitimate argument.
I'm coming to the conclusion that we should simply accept "sequence of arrays". This means
- We can check that
len(data) == n_components
- We can iterate over the component arrays.
- It works for list/tuple.
- It works for n-dim arrays - since they are a sequence of n-1 dim arrays.
Bonus points if you give a helpful error message in case of len(data) != n_components
if it is an array with shape(data)[-1] == n_components
.
The ambiguity argument, which led me to propose not accepting arrays is actually not that strong. Either it has the right shape, so list(data)
is just extra work, or the order is different, but we can fail with a clear error message then. The only case we cannot dect whether the use has organized the data correctly are (n, n) shaped arrays, but that should be rare as typically 2 <= n_components <= few
but values are either 1 or many.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @jklymak and @timhoffm for the feedback :D
I'm coming to the conclusion that we should simply accept "sequence of arrays". This means
The way I have implemented this is a type check isinstance(data, (tuple, list))
. We could accept other container classes as well, but in my mind it is reasonable to lock it down (and this simplifies the typing/errors).
Using multi_norm = mpl.colors.MultiNorm(['linear', 'linear'])
there are now 7 different error messages. All of them start with the text:
ValueError: The input to this
MultiNorm
must be a list or tuple of length 2, or be structured array with 2 fields.
multi_norm(np.zeros((2, 3)))
gives:
ValueError: The input to this
MultiNorm
must be a list or tuple of length 2, or be structured array with 2 fields. You can usedata_as_list = list(data)
to convert the input data of shape (2, 3) to a compatible list
multi_norm(np.zeros((3, 2)))
gives:
ValueError: The input to this
MultiNorm
must be a list or tuple of length 2, or be structured array with 2 fields. You can usedata_as_list = list(data.T)
to convert the input data of shape (3, 2) to a compatible list
multi_norm(np.zeros((3, 3, 2)))
gives:
ValueError: The input to this
MultiNorm
must be a list or tuple of length 2, or be structured array with 2 fields. You can usedata_as_list = [data[..., i] for i in range(data.shape[-1])]
to convert the input data of shape (3, 3, 2) to a compatible list
If the axis is inconsistent so that both first and last or neither first or last matches: i.e. multi_norm(np.zeros((2, 2)))
gives:
ValueError: The input to this
MultiNorm
must be a list or tuple of length 2, or be structured array with 2 fields. An np.ndarray of shape (2, 2) is not compatible
The above covers the cases with a scalar np.ndarray. There is also:
For a structured array with the wrong number of fields:
ValueError: The input to this
MultiNorm
must be a list or tuple of length 2, or be structured array with 2 fields. A structured array with 3 fields is not compatible
For an unsupported class: multi_norm(1)
ValueError: The input to this
MultiNorm
must be a list or tuple of length 2, or be structured array with 2 fields. Input of type <class 'int'> is not supported
and finally, for a list/tuple of wrong length: multi_norm((1, 1, 1))
ValueError: The input to this
MultiNorm
must be a list or tuple of length 2, or be structured array with 2 fields. A <class 'tuple'> of length 3 is not compatible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have thought that this is a sufficient and reasonable implementation:
def _iterable_components_in_data(data, n_components):
if isinstance(data, np.ndarray) and data.dtype.fields is not None:
# structured array
if len(data.dtype.fields) != n_components:
raise ValueError(
"Structured array inputs to MultiNorm must have the same "
"number of fields as components in the MultiNorm. Expected "
f"{n_components}, but got {len(data.dtype.fields} fields"
else:
return tuple(data[field] for field in data.dtype.names)
try:
n_elements = len(data)
except TypeError:
raise ValueError("MultiNorm expects a sequence with one element per "
f"component as input, but got {data!r} instead")
if n_elements != n_components:
raise ValueError(
"MultiNorm expects a sequence with one element per component. "
f"This MultiNorm has {n_components} components, but got a sequence "
f"with {n_elements} elements"
return tuple(data[i] for i in range(n_elements))
This implicitly accepts arrays and other sequence-like containers. Do you see inputs where the error messages are not clear enough. Optionally, one can add special messages for incorrectly shaped arrays in the last if to give the conversion guidance.
dd8f0c6
to
67b8264
Compare
This commit merges a number of commits now contained in https://github.com/trygvrad/matplotlib/tree/multivariate-plot-prapare-backup , keeping only the MultiNorm class
Co-authored-by: Tim Hoffmann <2836374+timhoffm@users.noreply.github.com>
Co-authored-by: Tim Hoffmann <2836374+timhoffm@users.noreply.github.com>
81ab0d5
to
babbee0
Compare
data = tuple(data[descriptor[0]] for descriptor in data.dtype.descr) | ||
if len(data) != n_components: | ||
raise ValueError(f"{MultiNorm._get_input_err(n_components)}" | ||
f". A structured array with " | ||
f"{len(data)} fields is not compatible") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data = tuple(data[descriptor[0]] for descriptor in data.dtype.descr) | |
if len(data) != n_components: | |
raise ValueError(f"{MultiNorm._get_input_err(n_components)}" | |
f". A structured array with " | |
f"{len(data)} fields is not compatible") | |
if len(data.dtype.fields) != n_components: | |
raise ValueError( | |
"Structured array inputs to MultiNorm must have the same " | |
"number of fields as components in the MultiNorm. Expected " | |
f"{n_components}, but got {len(data.dtype.fields} fields.") | |
else: | |
return tuple(data[field] for field in data.dtype.names) |
I think we can:
- separate the cases better. With
if raise else return
it's obviously clear that the structured array case is handled exhaustively. creatingdata
and letting it fall through to the finalreturn data
increases mental load. - use a more focussed error message for structured arrays. If people are inputting structured arrays, they should be very much know what they want and likely just have the wrong number of elements. They don't need the full possible set of inputs.
lib/matplotlib/colors.py
Outdated
Data to normalize, as tuple, scalar array or structured array. | ||
|
||
- If tuple, must be of length `n_components` | ||
- If scalar array, the first axis must be of length `n_components` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have thought that this is a sufficient and reasonable implementation:
def _iterable_components_in_data(data, n_components):
if isinstance(data, np.ndarray) and data.dtype.fields is not None:
# structured array
if len(data.dtype.fields) != n_components:
raise ValueError(
"Structured array inputs to MultiNorm must have the same "
"number of fields as components in the MultiNorm. Expected "
f"{n_components}, but got {len(data.dtype.fields} fields"
else:
return tuple(data[field] for field in data.dtype.names)
try:
n_elements = len(data)
except TypeError:
raise ValueError("MultiNorm expects a sequence with one element per "
f"component as input, but got {data!r} instead")
if n_elements != n_components:
raise ValueError(
"MultiNorm expects a sequence with one element per component. "
f"This MultiNorm has {n_components} components, but got a sequence "
f"with {n_elements} elements"
return tuple(data[i] for i in range(n_elements))
This implicitly accepts arrays and other sequence-like containers. Do you see inputs where the error messages are not clear enough. Optionally, one can add special messages for incorrectly shaped arrays in the last if to give the conversion guidance.
PR summary
This PR continues the work of #28658 and #28454, aiming to close #14168. (Feature request: Bivariate colormapping)
This is part one of the former PR, #29221. Please see #29221 for the previous discussion
Features included in this PR:
MultiNorm
class. This is a subclass ofcolors.Normalize
and holdsn_variate
norms.MultiNorm
classFeatures not included in this PR:
MultiNorm
together withBivarColormap
andMultivarColormap
to the plotting functionsaxes.imshow(...)
,axes.pcolor
, and `axes.pcolormesh(...)