Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve stream rendering performances #641

Open
wants to merge 10 commits into
base: 1.11.x
Choose a base branch
from
Open

Improve stream rendering performances #641

wants to merge 10 commits into from

Conversation

satabin
Copy link
Member

@satabin satabin commented Oct 24, 2024

This change addresses #634 in two ways:

  • A change to the generic pretty printer that allows to avoid:
    • Boxing of integers when computing the layout
    • Instantiating many tuples that are immediately discarded
    • Creating intermediate chunks by fusing the annotation and rendering phases
  • Re-introduce a direct compact rendering for JSON, that is way simpler than leveraging the generic printer with no groups

By doing so, the compact rendering should be back to the performance in 1.10 and the pretty case is improved a bit.

Here are some benchmark results with the compact rendering in 1.11.1 as the baseline. 1.11.2 represents the results with this change.

xychart-beta
  title "Rendering of int array"
  x-axis ["Pretty 1.10", "Pretty 1.11.1", "Pretty 1.11.2", "Compact 1.11.1", "Compact 1.11.2"]
  y-axis "Factor" 0 --> 1.2
  bar [0.02, 1.12, 0.71, 1, 0.004]
Loading
xychart-beta
  title "Rendering of int object"
  x-axis ["Pretty 1.10", "Pretty 1.11.1", "Pretty 1.11.2", "Compact 1.11.1", "Compact 1.11.2"]
  y-axis "Factor" 0 --> 1.2
  bar [0.01, 1.11, 0.85, 1, 0.01]
Loading

@satabin satabin requested a review from a team as a code owner October 24, 2024 09:30
@satabin satabin added enhancement New feature or request json regression labels Oct 24, 2024
@satabin
Copy link
Member Author

satabin commented Oct 24, 2024

@recons This PR should solve your problem with compact rendering once merged and released.

@satabin satabin force-pushed the json/render branch 3 times, most recently from 2829f8f to a0c50b0 Compare October 24, 2024 12:55
Copy link
Collaborator

@ybasket ybasket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just few minor comments.

Comment on lines +26 to +28
(List
.range(0, 1000000)
.map(i => Token.NumberValue(i.toString())) :+ Token.EndArray))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(List
.range(0, 1000000)
.map(i => Token.NumberValue(i.toString())) :+ Token.EndArray))
(List.tabulate(1000000)(i => Token.NumberValue(i.toString())) :+ Token.EndArray))

Minor, but IMHO, that makes the intent a bit clearer.

I also would love to avoid appending to that long list, but I assume you want a single chunk? Otherwise using Stream's ++, this could be simplified further.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I want a single chunk, to see the overhead in processing one chunk.

annctx.groups.unsnoc match {
case Some((OpenGroup(ghpl, gindent, group), groups)) =>
annctx.groups = groups.snoc(OpenGroup(ghpl, gindent, group.append(evt)))
case None => // should never happen
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a better option than silently ignoring this bug? fs2 itself uses assert() to catch some bugs – not cool because AFAICT here we would still produce a semi-valid result, but let's at least have the discussion (and maybe a decision for all of fs2-data).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't like cases like this one, where we cannot statically ensure never happen. And I don't like crashing with an assert either. I will try to find something better here. But it was already like that before 😬

}

private def renderIndentBegin(ctx: RenderingContext): Unit = {
ctx.lines = NonEmptyList(ctx.lines.head + (" " * indentSize), ctx.lines.tail)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think a StringBuilder that is allocated to the right capacity could help speeding up here? Like first append the head, then indentSize spaces?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was think it would actually be way better to cache the padding. In a structured data rendering the same indent size will be found over and over again. I will try to reuse the padding that was already encountered for a indent depth, rather than this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request json regression
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants