Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add automated summaries #135

Closed
wants to merge 2 commits into from
Closed

Conversation

janezd
Copy link
Contributor

@janezd janezd commented Mar 4, 2021

Issue

Ref: biolab/orange3#5108

Description of changes

Outputs

If:

  • there is a signal marked with auto_summary = True
  • or
    • no signal has auto_summary = True, and
    • the widget has a single output or one of the outputs is marked as default,
    • and this signal's auto_summary is not explicitly set to False,
      then calling send on this signal will also set the output summary.

The output summary is set if the output value is None or a summarising function for this data type is registered. An example of such function is

@summarize.register(Table)
def summarize(data):
    return len(data), format_summary_details(data)

Benefits: simpler code, less room for error, easier unification and future changes. Also, all add-ons automatically receive this function.

(Non-consequntial?) imcompatibility: if a widget already implements setting an output summary (which most do) and the summary differs from what would be generated automatically (mostly not) and the summary is set before sending the signal (mostly not), then the explicit summary will be overridden. This should not be a big problem and can be disabled by turning the auto summary for the default signal off.

Another solution that we discussed (turning auto summary off at the first explicit call of set_output_summary is a bad idea because (a) it adds uncontrollable magic, (b) is not really needed, (c) output summaries will be refactored anyway if this is merged and, most importantly (d) set_output_summary does not hold a reference to the widget and/or the signal, so it doesn't know what to disable. And it should stay that way.

Inputs

Inputs follow the same logic for designating the signal for the summary and use the same functions for summarizing. The implementation works by using the existing @Input decorator to intercept the incoming signal and set the summary.

Includes
  • Code changes
  • Tests
  • Documentation

@janezd janezd force-pushed the auto-summary branch 2 times, most recently from 9167a3a to 7f78f23 Compare March 4, 2021 13:43
@janezd janezd changed the title [RFC] Add automated output summaries [RFC] Add automated summaries Mar 4, 2021
@irgolic
Copy link
Member

irgolic commented Mar 4, 2021

Thank you for this!

I still think we could make this a lot simpler. We could enforce summary messages on all data signals, such that no summary code needs to be written in subclasses. Consider it a feature of the status bar. Aside from being easier to implement, the enforced consistency would allow users to rely on the status bar for learning a widget's signals.

Showing a summary for only some signals on a case-to-case basis is arbitrary and confusing.

@janezd
Copy link
Contributor Author

janezd commented Mar 4, 2021

More information is not necessarily better than less.

I went through all widget with multiple outputs. Except in a very few cases, multiple outputs occur in widgets that perform some kind of selection and thus have a signal with a sample and a signal with all data (with additional column). In this case, the user is interested only in the summary of the former. Seeing the data that you're interested in is better than seeing all the data and having to find the (only) interesting part.

As for multiple inputs, there are a few (~3) widgets that are really complex (e.g. Test and Score), hence any readable summary for multiple inputs would have to be handcrafted. In all other cases, the two inputs are the data and the subset. In this case, I would also prefer a uniform, but handcrafted (= easily interpretable) summary.

such that no summary code needs to be written in subclasses

I think the current solution requires no code in individual widgets. On the other hand, a general code showing all signals would result in a mess that would require specific overrides.

@irgolic
Copy link
Member

irgolic commented Mar 4, 2021

Less information is not necessarily simpler or better interpretable. It's about establishing a convention and teaching users to rely on it.

Except in a very few cases, multiple outputs occur in widgets that perform some kind of selection and thus have a signal with a sample and a signal with all data (with additional column). In this case, the user is interested only in the summary of the former.

I think for selection widgets, showing both would be nice and informative, too. Eventually, I'd like to show all types of signals and bold the number if the input/output is connected. As far as the user is concerned, they want to intuit as much about Orange as they can without reading the documentation.

As an example, Select Rows has three output signals (matching, unmatching, all data + column). The output summary only shows the number of rows matched. You could argue that the number of rows not matched is easily calculable (input - matched). But for the price of 20 pixels, we can easily and intuitively inform the user that they can output unmatched data too (biolab/orange3#5254).

As for multiple inputs, there are a few (~3) widgets that are really complex (e.g. Test and Score), hence any readable summary for multiple inputs would have to be handcrafted.

Wouldn't Test and Score just show Data and Test Data inputs?

I would also prefer a uniform, but handcrafted (= easily interpretable) summary.

handcrafted != easily interpretable, especially for people who're using Orange for the first time. Intuition is achieved by establishing and following a convention. If you make too many case-by-case design decisions, you risk losing the user's trust.

I get that you'd prefer to keep Orange as simple as possible, but I think users need to learn about signals if they want to become proficient in Orange. Perhaps we could establish the first output signal as the default?

@janezd
Copy link
Contributor Author

janezd commented Mar 4, 2021

Wouldn't Test and Score just show Data and Test Data inputs?

Not necessarily. We would also define other functions for summarizing the results, for instance for distance matrices, networks... Probably not models, though. I also forgot to exclude cases where the widget accepts multiple inputs for each signal. Though for Test and Score you're probably right.

handcrafted != easily interpretable

Sure. But it can also be ==. And != can be < (if bad design) or > (generally). :)

For Select rows, showing 150 (135 : 15) or 150 (135 + 15) or something similar is certainly at least as good as 150, 135, 15. For Scatter plot, showing 150, 35, where the first number always equals the one shown at input is redundant.

But for the price of 20 pixels,

I may agree with this for unmatched data in Select Rows, but not for Annotated Data in almost all widgets. I believe Tufte would take my side, too. :)

I think both principles have merits - being clean and slick, as well as being consistent. I guess we should go through all widgets and see what would this mean in practice.

@irgolic
Copy link
Member

irgolic commented Mar 4, 2021

I think there's a dissonance between widget windows and the canvas. I suppose I'm trying to propose this as a solution, to make the inputs/outputs of the widget clear in its window, also showing which inputs/outputs are connected (great for debugging).

Perhaps we could generate a good description for the message hover/popup? Or we could represent signals with icons, next to the handcrafted text?

@janezd janezd mentioned this pull request Mar 5, 2021
3 tasks
@janezd
Copy link
Contributor Author

janezd commented Mar 6, 2021

Closed in favour of #136.

I hate how right you were in the above discussion. Showing everything is indeed amazing.

@janezd janezd closed this Mar 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants