Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contiguous storage for path segments #369

Merged
merged 3 commits into from
Oct 7, 2023
Merged

Contiguous storage for path segments #369

merged 3 commits into from
Oct 7, 2023

Conversation

raphlinus
Copy link
Contributor

This is part of the larger multisampled path rendering work, under stroke rework (#303). It refactors the GPU pipeline so that the path segments available to fine rasterization are stored as a contiguous slice rather than a linked list as before.

Numerous parts of the pipeline are refactored. In the old pipeline, path segment decoding generated cubic line segments and also estimated a bounding box (somewhat imprecise), and the combination of flattening those cubics and tiling was in a separate stage (path_coarse) quite a bit later in the pipeline. In the new pipeline, path decoding is fused with flattening, generating a LineSoup structure (line segments associated with paths, otherwise unordered) (with bbox as a side effect), and tiling is spread over multiple stages, later in the pipeline.

The first tiling stage (path_count) counts the number of tiles that will be generated. Then coarse rasterization allocates contiguous slices based on those counts. The second stage does a scattered write of the resulting tiles. Both of these stages rely on indirect dispatch, as the number of lines and the number of segments (respectively) are not known at encode time.

These changes only make sense for filled paths, thus they relied on stroke expansion being done earlier, currently on the CPU.

This is part of the larger multisampled path rendering work, under stroke rework (#303). It refactors the GPU pipeline so that the path segments available to fine rasterization are stored as a contiguous slice rather than a linked list as before.

Numerous parts of the pipeline are refactored. In the old pipeline, path segment decoding generated cubic line segments and also estimated a bounding box (somewhat imprecise), and the combination of flattening those cubics and tiling was in a separate stage (path_coarse) quite a bit later in the pipeline. In the new pipeline, path decoding is fused with flattening, generating a `LineSoup` structure (line segments associated with paths, otherwise unordered) (with bbox as a side effect), and tiling is spread over multiple stages, later in the pipeline.

The first tiling stage (path_count) counts the number of tiles that will be generated. Then coarse rasterization allocates contiguous slices based on those counts. The second stage does a scattered write of the resulting tiles. Both of these stages rely on indirect dispatch, as the number of lines and the number of segments (respectively) are not known at encode time.

These changes only make sense for filled paths, thus they relied on stroke expansion being done earlier, currently on the CPU.
Copy link
Collaborator

@armansito armansito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a number of minor comments and questions for clarification. The new pipeline structure and the changes make sense to me per all of our prior discussions and your posts around the topic.

I'm OK with merging the changes once you've addressed some of the comments.

Comment on lines +51 to +52
let m = self.b.min(other.a);
ClipBic::new(self.a + other.a - m, self.b + other.b - m)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind documenting the logic here? Perhaps simply reference section 3 in https://browse.arxiv.org/pdf/2205.11659.pdf and mention the term "bicyclic semigroup".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, left this out of the commit, but will get to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: this was fixed.

crates/encoding/src/config.rs Show resolved Hide resolved
Comment on lines 87 to 90
// We overload the "segments" field to store both count (written by
// path_count stage) and segment allocation (used by path_tiling and
// fine).
let n_segs = tile.segments;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is up to you but consider naming this tile.segment_count_or_ix or something similar. It may seem verbose but it can help a reader who is new to the code base to know that this field has two interpretations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, renamed. Remind me to apply the rename on the CPU port as well.

let n_segs = tile.segments;
if n_segs != 0u {
var seg_ix = atomicAdd(&bump.segments, n_segs);
tiles[tile_ix].segments = ~seg_ix;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for negating seg_ix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a comment in write_path in coarse, but I added another in shared. I agree the renaming would be useful. This was a subtle error and originally both were stored as positive values, but I missed the fact that a tile could fail to be allocated because of clipping.

if n_segs != 0u {
var seg_ix = atomicAdd(&bump.segments, n_segs);
tiles[tile_ix].segments = ~seg_ix;
alloc_cmd(4u);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for a future improvement: IWBN to have constants like CMD_FILL_PAYLOAD_SIZE declared in ptcl.wgsl, in place of the hard-coded 4u.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but I'll defer that from this PR. Even better would be having a way to share those constants between GPU and CPU, but that's probably too much to ask in 2023.

shader/fine.wgsl Outdated
Comment on lines 221 to 223
let n_segs = fill.size_and_rule >> 1u;
let even_odd = (fill.size_and_rule & 1u) != 0u;
area = fill_path(fill.seg_data, n_segs, fill.backdrop, xy, even_odd);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would have fill_path take a CmdFill and move the n_segs and even_odd computation to fill_path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this is much nicer. There's a reason for it, originally the segments were inline in the ptcl, so the interpreter loop needed to know the size. But now they're in a separate buffer, it's possible to clean up.

@@ -85,6 +173,11 @@ fn read_i16_point(ix: u32) -> vec2<f32> {
return vec2(x, y);
}

struct Transform {
matrx: vec4<f32>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe just spell out matrix or if this must be shortened go with mat? Just omitting the one letter doesn't seem that beneficial to me :)

BTW will it make sense to store a 3x3 matrix here in the future when we want to support real perspective? Or is it beneficial to keep the translation component decomposed in that world too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a keyword in WGSL. And I don't know yet the answer to that question, it depends on how the math works out. An affine could be stored as a mat3x2f, but I chose not to do that so that the transform could be written without having to multiply by 1.

I'll go with mat as I do like conciseness.


let MAX_QUADS = 16u;

fn flatten_cubic(cubic: Cubic) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new organization of the curve eval and flattening code that used to be in path_coarse.


let dx = abs(s1.x - s0.x);
let dy = s1.y - s0.y;
if dx + dy == 0.0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: should this check be against an epsilon? Would you mind pointing out in which earlier stage a zero length segment would have been culled (so I can better understand this exact comparison)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it can't be a check against epsilon, as it might still cross a tile boundary, in which case it would affect winding numbers at higher levels of the subdivision hieararchy. This stuff is subtle.

It could be eliminated in LineSoup generation. That becomes an additional global invariant, while having it here is more local reasoning; letting it slip through would be pretty bad.

Also I took out the TODO on line 61, as that's been resolved.

let count = span(s0.x, s1.x) + span(s0.y, s1.y) - 1u;
let dx = abs(s1.x - s0.x);
let dy = s1.y - s0.y;
let idxdy = 1.0 / (dx + dy);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe document that dx + dy cannot be zero and the division is valid because a zero-length line should have been discarded by path_count?

Mostly comments, but also a little cleanup of function signatures in fine.
Refer to Arxiv paper and mention bicyclic semigroups.
@raphlinus raphlinus merged commit 1ef6724 into main Oct 7, 2023
4 checks passed
@raphlinus raphlinus deleted the contig_tiling branch October 7, 2023 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants