[RFC] Add automated summaries #135

janezd · 2021-03-04T12:44:59Z

Issue

Description of changes

Outputs

If:

there is a signal marked with auto_summary = True
or
- no signal has auto_summary = True, and
- the widget has a single output or one of the outputs is marked as default,
- and this signal's auto_summary is not explicitly set to False,
  then calling send on this signal will also set the output summary.

The output summary is set if the output value is None or a summarising function for this data type is registered. An example of such function is

@summarize.register(Table)
def summarize(data):
    return len(data), format_summary_details(data)

Benefits: simpler code, less room for error, easier unification and future changes. Also, all add-ons automatically receive this function.

(Non-consequntial?) imcompatibility: if a widget already implements setting an output summary (which most do) and the summary differs from what would be generated automatically (mostly not) and the summary is set before sending the signal (mostly not), then the explicit summary will be overridden. This should not be a big problem and can be disabled by turning the auto summary for the default signal off.

Another solution that we discussed (turning auto summary off at the first explicit call of set_output_summary is a bad idea because (a) it adds uncontrollable magic, (b) is not really needed, (c) output summaries will be refactored anyway if this is merged and, most importantly (d) set_output_summary does not hold a reference to the widget and/or the signal, so it doesn't know what to disable. And it should stay that way.

Inputs

Inputs follow the same logic for designating the signal for the summary and use the same functions for summarizing. The implementation works by using the existing @Input decorator to intercept the incoming signal and set the summary.

Includes

Code changes
Tests
Documentation

irgolic · 2021-03-04T14:19:52Z

Thank you for this!

I still think we could make this a lot simpler. We could enforce summary messages on all data signals, such that no summary code needs to be written in subclasses. Consider it a feature of the status bar. Aside from being easier to implement, the enforced consistency would allow users to rely on the status bar for learning a widget's signals.

Showing a summary for only some signals on a case-to-case basis is arbitrary and confusing.

janezd · 2021-03-04T14:44:56Z

More information is not necessarily better than less.

I went through all widget with multiple outputs. Except in a very few cases, multiple outputs occur in widgets that perform some kind of selection and thus have a signal with a sample and a signal with all data (with additional column). In this case, the user is interested only in the summary of the former. Seeing the data that you're interested in is better than seeing all the data and having to find the (only) interesting part.

As for multiple inputs, there are a few (~3) widgets that are really complex (e.g. Test and Score), hence any readable summary for multiple inputs would have to be handcrafted. In all other cases, the two inputs are the data and the subset. In this case, I would also prefer a uniform, but handcrafted (= easily interpretable) summary.

such that no summary code needs to be written in subclasses

I think the current solution requires no code in individual widgets. On the other hand, a general code showing all signals would result in a mess that would require specific overrides.

irgolic · 2021-03-04T15:34:14Z

Less information is not necessarily simpler or better interpretable. It's about establishing a convention and teaching users to rely on it.

Except in a very few cases, multiple outputs occur in widgets that perform some kind of selection and thus have a signal with a sample and a signal with all data (with additional column). In this case, the user is interested only in the summary of the former.

I think for selection widgets, showing both would be nice and informative, too. Eventually, I'd like to show all types of signals and bold the number if the input/output is connected. As far as the user is concerned, they want to intuit as much about Orange as they can without reading the documentation.

As an example, Select Rows has three output signals (matching, unmatching, all data + column). The output summary only shows the number of rows matched. You could argue that the number of rows not matched is easily calculable (input - matched). But for the price of 20 pixels, we can easily and intuitively inform the user that they can output unmatched data too (biolab/orange3#5254).

As for multiple inputs, there are a few (~3) widgets that are really complex (e.g. Test and Score), hence any readable summary for multiple inputs would have to be handcrafted.

Wouldn't Test and Score just show Data and Test Data inputs?

I would also prefer a uniform, but handcrafted (= easily interpretable) summary.

handcrafted != easily interpretable, especially for people who're using Orange for the first time. Intuition is achieved by establishing and following a convention. If you make too many case-by-case design decisions, you risk losing the user's trust.

I get that you'd prefer to keep Orange as simple as possible, but I think users need to learn about signals if they want to become proficient in Orange. Perhaps we could establish the first output signal as the default?

janezd · 2021-03-04T15:59:46Z

Wouldn't Test and Score just show Data and Test Data inputs?

Not necessarily. We would also define other functions for summarizing the results, for instance for distance matrices, networks... Probably not models, though. I also forgot to exclude cases where the widget accepts multiple inputs for each signal. Though for Test and Score you're probably right.

handcrafted != easily interpretable

Sure. But it can also be ==. And != can be < (if bad design) or > (generally). :)

For Select rows, showing 150 (135 : 15) or 150 (135 + 15) or something similar is certainly at least as good as 150, 135, 15. For Scatter plot, showing 150, 35, where the first number always equals the one shown at input is redundant.

But for the price of 20 pixels,

I may agree with this for unmatched data in Select Rows, but not for Annotated Data in almost all widgets. I believe Tufte would take my side, too. :)

I think both principles have merits - being clean and slick, as well as being consistent. I guess we should go through all widgets and see what would this mean in practice.

irgolic · 2021-03-04T18:24:59Z

I think there's a dissonance between widget windows and the canvas. I suppose I'm trying to propose this as a solution, to make the inputs/outputs of the widget clear in its window, also showing which inputs/outputs are connected (great for debugging).

Perhaps we could generate a good description for the message hover/popup? Or we could represent signals with icons, next to the handcrafted text?

janezd · 2021-03-06T20:31:20Z

Closed in favour of #136.

I hate how right you were in the above discussion. Showing everything is indeed amazing.

janezd mentioned this pull request Mar 4, 2021

[ENH] Automated and better summaries biolab/orange3#5308

Merged

3 tasks

janezd force-pushed the auto-summary branch 2 times, most recently from 9167a3a to 7f78f23 Compare March 4, 2021 13:43

janezd added 2 commits March 4, 2021 15:06

Add automated output summaries

df1b49f

Add automated input summaries

200dbfc

janezd force-pushed the auto-summary branch from 7f78f23 to 200dbfc Compare March 4, 2021 14:06

janezd changed the title ~~[RFC] Add automated output summaries~~ [RFC] Add automated summaries Mar 4, 2021

janezd mentioned this pull request Mar 5, 2021

Add automated summaries #136

Merged

3 tasks

janezd closed this Mar 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Add automated summaries #135

[RFC] Add automated summaries #135

janezd commented Mar 4, 2021 •

edited

Loading

irgolic commented Mar 4, 2021 •

edited

Loading

janezd commented Mar 4, 2021

irgolic commented Mar 4, 2021

janezd commented Mar 4, 2021

irgolic commented Mar 4, 2021 •

edited

Loading

janezd commented Mar 6, 2021

[RFC] Add automated summaries #135

[RFC] Add automated summaries #135

Conversation

janezd commented Mar 4, 2021 • edited Loading

Issue

Description of changes

Outputs

Inputs

Includes

irgolic commented Mar 4, 2021 • edited Loading

janezd commented Mar 4, 2021

irgolic commented Mar 4, 2021

janezd commented Mar 4, 2021

irgolic commented Mar 4, 2021 • edited Loading

janezd commented Mar 6, 2021

janezd commented Mar 4, 2021 •

edited

Loading

irgolic commented Mar 4, 2021 •

edited

Loading

irgolic commented Mar 4, 2021 •

edited

Loading