Skip to content

Add multiple label support for Axes.plot() #16178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Dec 24, 2020

Conversation

yozhikoff
Copy link
Contributor

@yozhikoff yozhikoff commented Jan 10, 2020

PR Summary

Since plt.plot() supports multidimensional input it would be reasonable to support multiple labels.
It was mentioned on SO here.

from matplotlib import pyplot as plt

x = [1, 2, 5]

y = [[2, 4, 3],
    [4, 7, 1],
    [3, 9, 2]]

plt.plot(x, y, label=['one', 'two', 'three'])
plt.legend()

example

It works with all iterable types, with string instances and non-iterables the behavior is just the same as before.

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@anntzer
Copy link
Contributor

anntzer commented Jan 10, 2020

I don't like the fact that plot() supports that to start with, but given that it does (and that that's unlikely to get deprecated), I guess the feature makes sense.
This will need some tests and doc, though.

@jklymak
Copy link
Member

jklymak commented Jan 10, 2020

I'm -0.75 on this. What about all the other things associated with plot? By analogy we should allow an Iterable of markers, markersizes, colors, linewidths, linestyles? etc etc. plot is already messy enough and tries to do too much, so I don't think its too much to ask folks to iterate through their matrix and explicitly label each. The following is a tiny bit more verbose, but allows more customization of each line, and is a whole lot more specific:

labels=['one', 'two', 'three']
for i in range(3):
   plt.plot(x, y[i,:], label=labels[i])
plt.legend()

@yozhikoff
Copy link
Contributor Author

I agree that plot is overcomplicated in some terms, but I don't think that the analogy with markers, markersizes, colors, etc works here. This matrix plotting functionality was made obviously for 'fast plotting' purposes, where customization isn't that important. Labels, however, are crucial for determining what do the plotted lines mean.

@ImportanceOfBeingErnest
Copy link
Member

I think some more vectorized input options to many of the plotting functions would be great; but I do see the problem of maintainability of those organically grown functions.
Specifically for legend labels for lines, there seems to be no urgent need, because

import matplotlib.pyplot as plt

x = [1, 2, 5]
y = [[2, 4, 3], [4, 7, 1], [3, 9, 2]]

lines = plt.plot(x, y)
labels = ['one', 'two', 'three']
plt.legend(lines, labels)
plt.show() 

works fine already now.

@timhoffm
Copy link
Member

timhoffm commented Jan 10, 2020

If one has multiple labelled datasets, using a structured data structure like pandas.DataFrame is also a good idea.

x = [1, 2, 5]
y = [[2, 4, 3], [4, 7, 1], [3, 9, 2]]

pd.DataFrame(y, index=x, columns=['one', 'two', 'three']).plot()

@timhoffm
Copy link
Member

I'm +/- 0 on this.

  • It's a valid point that plt.plot(x, y) is one of the fastest ways to inspect data. Supporting labels for that adds value. Though @ImportanceOfBeingErnest's example is a reasonable pattern as well.
    Supporting lists of values for parameters is not necessary. They are automatically cycled, which is good enough for a quick view.
  • OTOH, supporting multiple datasets in one call is not a good design in the first place. Encouraging its use by adding features is questionable. Do we have an opinion if we could/would deprecate this use of multiple datasets? To me, that would be an indicator for the practical relevance of that functionality.

@tacaswell tacaswell added this to the v3.3.0 milestone Jan 27, 2020
Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the source of a long discussion on this weeks phone call: https://hackmd.io/hELmT6nMToSPhpP8-0mUZA?both#16178-labels-to-all-lines-plotted-in-plot

The very short summary:

  • we like the idea, it should go in
  • only work for a single x, y pair of inputs (like we do with the data kwarg)
  • needs a whats_new entry

@QuLogic
Copy link
Member

QuLogic commented May 19, 2020

Please rebase to fix the conflicts.

@tacaswell tacaswell modified the milestones: v3.3.0, v3.4.0 May 19, 2020
@yozhikoff yozhikoff requested a review from QuLogic June 11, 2020 21:54
@QuLogic
Copy link
Member

QuLogic commented Jun 17, 2020

I don't think you rebased quite correctly; there should not be so many commits here.

@yozhikoff yozhikoff force-pushed the add-multiple-label-support branch from b71d144 to f33a30e Compare July 4, 2020 21:22
Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commits should be squashed before/when merging.

@jklymak
Copy link
Member

jklymak commented Jul 10, 2020

@tacaswell you are blocking on this...

@efiring
Copy link
Member

efiring commented Jul 10, 2020

Looking at the tests: the first and last look good, but the middle two indicate that the intended behavior is to allow mismatches between the number of lines and the number of labels, and that labels don't even have to be strings. I don't think this is a good idea. Instead of bending over backwards to accept almost any user input, I think we should leave the user with some responsibility to provide inputs that make sense. Am I misinterpreting something?

@timhoffm
Copy link
Member

@efiring in the current implementation labels don't have to be strings. So this is just backwards-compatible. Whether we want strings-only should be discussed separately because it's an API change.

@tacaswell tacaswell dismissed their stale review July 15, 2020 13:35

outdated

@tacaswell
Copy link
Member

This seems reasonable, but it looks like it is not checking if we are getting the multiple-x-y-pairs and either failing:

plt.plot([1, 2], [[3, 4], [10, 11]], [5, 6], [7, 8], label=['a', 'b']); 
plt.legend()  

so

This probably also needs an API change note. The signature on set_label (which is how the label kwarg eventually actually gets handled) is "object that can be converted to string via str(obj)" not "string" so I suspect we are going to have a few users who are going to be broken by this.

Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue of allowing this with plt.plot(x, y, x1, y1, labels=[...]) should either be addressed or intentionally dismissed.

Anyone can clear this review.

@yozhikoff yozhikoff force-pushed the add-multiple-label-support branch from e3fd5fc to 24325da Compare December 23, 2020 20:04
@timhoffm
Copy link
Member

timhoffm commented Dec 23, 2020

@tacaswell's comment #16178 (review) is not addressed (can be checked by the example above the linked comment.

I'd be ok with the current state. I.e. multiple labels only supporting the case plot(x, ys) where ys contains data in multiple columns. And in particular not supporting plot(x, y, x2, y2). But that limitation should be mentioned explicitly in the docstring description of the label parameter.

@jklymak
Copy link
Member

jklymak commented Dec 23, 2020

I'm still not a fan of this just because plot is a big beast and I don't think we should be encouraging this API by making it more expansive and frankly inconsistent. For instance if I plot two data sets:

import matplotlib.pyplot as plt
import numpy as np

x = np.array([1, 2, 3])
y = np.array([[1, 2, 3], [4, 5, 6]]).T

x1 = np.array([1, 2, 3, 4]) * 2
y1 = np.array([[1, 2, 3, 5], [4, 5, 6, 7]]).T
fig, ax = plt.subplots()
ax.plot(x, y, x1, y1, label=[['one', 'two'], ['three', 'four']])
ax.legend()
plt.show()

I get

Legend1

If I instead do:

ax.plot(x, y, x1, y1, label=['one', 'two', 'three', 'four'])

We get an error:

Traceback (most recent call last):
  File "./testLabels.py", line 10, in <module>
    ax.plot(x, y, x1, y1, label=['one', 'two', 'three', 'four'])
  File "/Users/jklymak/matplotlib/lib/matplotlib/axes/_axes.py", line 1599, in plot
    lines = [*self._get_lines(*args, data=data, **kwargs)]
  File "/Users/jklymak/matplotlib/lib/matplotlib/axes/_base.py", line 304, in __call__
    yield from self._plot_args(this, kwargs)
  File "/Users/jklymak/matplotlib/lib/matplotlib/axes/_base.py", line 454, in _plot_args
    raise ValueError(f"label must be scalar or have the same "
ValueError: label must be scalar or have the same length as the input data, but found 4 for 2 datasets.

I am not convinced that it is a good idea to add a feature that only half works with the existing API. This is all in addition to my knee-jerk reaction that if we add this, then we will need to accept tuples for other properties, and we really don't want to go down that road.

I think our API should make basic things easy, but as soon as you want to get fancy, the user is just going to have to do a bit of work, i.e. doing the above as

for yy, lab in zip(y, ['first', 'second']):
   ax.plot(x, yy, label=lab)
for yy, lab in zip(y1, ['third', 'fourth']):
   ax.plot(x1, yy, label=lab)

is just as clear, and allows the user to specify other things in the for-loop, and/or keep the handles to the lines.

@timhoffm
Copy link
Member

@jklymak I appreciate your skepticism, and have shared it in the beginning. But #16178 (review) changed my mind that this is actually worth adding for the case plot(x, ys).

I don't care too much about the plot(x, y, x2, y2) API. While we cannot get rid of it for backward compatibility reasons, it's not worth jumping extra hoops to make that work as well (or error out gracefully) for multiple labels. In this case, IMHO it's sufficient to document clearly that that call does not support multiple labels.

Concerning "accept tuples for other properties": Labels are special in that compared to other properties they cannot be automatically generated/cycled. Due to this an extension of just label can be justified to better support "quick draws for inspection of data" (see #16178 (comment)).

@jklymak
Copy link
Member

jklymak commented Dec 24, 2020

Fair enough - but if we go down this road, I'm afraid we need a better error message: if I pass in 4 data sets and it tells me I only passed in two, it is relatively confusing.

@timhoffm
Copy link
Member

Agreed. I suspect that the proper place to check is before

with something like

if len(args) >= 4 and [has multiple labels]:
    raise ValueError("plot() with multiple groups of data does not support multiple labels")

@tacaswell
Copy link
Member

I agree with @jklymak and @timhoffm here. The behavior in #16178 (comment) is definitely "wrong".

If we start out raising now we will not break anything that used to work (as this is a new feature) and if someone comes with a compelling reason to support multiple labels (either through nested broadcasting or the raveled version) we can always add that in the future in a backwards compatible way.

@yozhikoff Sorry this has taken almost a year to get through.

Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should raise in the mulitple sets of input case because it is not clear what the broadcasting rules should be.

@tacaswell tacaswell merged commit 6305e8d into matplotlib:master Dec 24, 2020
@tacaswell
Copy link
Member

Thanks @yozhikoff ! Congratulations on what (I think) is your first merged Matplotlib PR 🎉 Hopefully we will hear from you agian!

@yozhikoff
Copy link
Contributor Author

@tacaswell Hopefully yes! I am a huge fan of Matplotlib, would love to work on improving it.
It took almost a year largely because I had some troubles figuring out the best solution for multiple data groups case - raising an error was an elegant idea, thanks @timhoffm !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants