Contiguous storage for path segments #369

raphlinus · 2023-10-06T18:10:23Z

This is part of the larger multisampled path rendering work, under stroke rework (#303). It refactors the GPU pipeline so that the path segments available to fine rasterization are stored as a contiguous slice rather than a linked list as before.

Numerous parts of the pipeline are refactored. In the old pipeline, path segment decoding generated cubic line segments and also estimated a bounding box (somewhat imprecise), and the combination of flattening those cubics and tiling was in a separate stage (path_coarse) quite a bit later in the pipeline. In the new pipeline, path decoding is fused with flattening, generating a LineSoup structure (line segments associated with paths, otherwise unordered) (with bbox as a side effect), and tiling is spread over multiple stages, later in the pipeline.

The first tiling stage (path_count) counts the number of tiles that will be generated. Then coarse rasterization allocates contiguous slices based on those counts. The second stage does a scattered write of the resulting tiles. Both of these stages rely on indirect dispatch, as the number of lines and the number of segments (respectively) are not known at encode time.

These changes only make sense for filled paths, thus they relied on stroke expansion being done earlier, currently on the CPU.

This is part of the larger multisampled path rendering work, under stroke rework (#303). It refactors the GPU pipeline so that the path segments available to fine rasterization are stored as a contiguous slice rather than a linked list as before. Numerous parts of the pipeline are refactored. In the old pipeline, path segment decoding generated cubic line segments and also estimated a bounding box (somewhat imprecise), and the combination of flattening those cubics and tiling was in a separate stage (path_coarse) quite a bit later in the pipeline. In the new pipeline, path decoding is fused with flattening, generating a `LineSoup` structure (line segments associated with paths, otherwise unordered) (with bbox as a side effect), and tiling is spread over multiple stages, later in the pipeline. The first tiling stage (path_count) counts the number of tiles that will be generated. Then coarse rasterization allocates contiguous slices based on those counts. The second stage does a scattered write of the resulting tiles. Both of these stages rely on indirect dispatch, as the number of lines and the number of segments (respectively) are not known at encode time. These changes only make sense for filled paths, thus they relied on stroke expansion being done earlier, currently on the CPU.

armansito

I left a number of minor comments and questions for clarification. The new pipeline structure and the changes make sense to me per all of our prior discussions and your posts around the topic.

I'm OK with merging the changes once you've addressed some of the comments.

armansito · 2023-10-06T18:58:56Z

crates/encoding/src/clip.rs

+        let m = self.b.min(other.a);
+        ClipBic::new(self.a + other.a - m, self.b + other.b - m)


Do you mind documenting the logic here? Perhaps simply reference section 3 in https://browse.arxiv.org/pdf/2205.11659.pdf and mention the term "bicyclic semigroup".

Oops, left this out of the commit, but will get to it.

Update: this was fixed.

crates/encoding/src/config.rs

armansito · 2023-10-06T19:08:16Z

shader/coarse.wgsl

+    // We overload the "segments" field to store both count (written by
+    // path_count stage) and segment allocation (used by path_tiling and
+    // fine).
+    let n_segs = tile.segments;


This is up to you but consider naming this tile.segment_count_or_ix or something similar. It may seem verbose but it can help a reader who is new to the code base to know that this field has two interpretations.

Agreed, renamed. Remind me to apply the rename on the CPU port as well.

armansito · 2023-10-06T19:12:36Z

shader/coarse.wgsl

+    let n_segs = tile.segments;
+    if n_segs != 0u {
+        var seg_ix = atomicAdd(&bump.segments, n_segs);
+        tiles[tile_ix].segments = ~seg_ix;


What's the reason for negating seg_ix?

There's a comment in write_path in coarse, but I added another in shared. I agree the renaming would be useful. This was a subtle error and originally both were stored as positive values, but I missed the fact that a tile could fail to be allocated because of clipping.

armansito · 2023-10-06T19:17:41Z

shader/coarse.wgsl

+    if n_segs != 0u {
+        var seg_ix = atomicAdd(&bump.segments, n_segs);
+        tiles[tile_ix].segments = ~seg_ix;
+        alloc_cmd(4u);


Suggestion for a future improvement: IWBN to have constants like CMD_FILL_PAYLOAD_SIZE declared in ptcl.wgsl, in place of the hard-coded 4u.

Agreed, but I'll defer that from this PR. Even better would be having a way to share those constants between GPU and CPU, but that's probably too much to ask in 2023.

armansito · 2023-10-06T19:22:50Z

shader/fine.wgsl

+                let n_segs = fill.size_and_rule >> 1u;
+                let even_odd = (fill.size_and_rule & 1u) != 0u;
+                area = fill_path(fill.seg_data, n_segs, fill.backdrop, xy, even_odd);


nit: I would have fill_path take a CmdFill and move the n_segs and even_odd computation to fill_path.

Agreed, this is much nicer. There's a reason for it, originally the segments were inline in the ptcl, so the interpreter loop needed to know the size. But now they're in a separate buffer, it's possible to clean up.

armansito · 2023-10-06T19:27:00Z

shader/flatten.wgsl

@@ -85,6 +173,11 @@ fn read_i16_point(ix: u32) -> vec2<f32> {
    return vec2(x, y);
 }

+struct Transform {
+    matrx: vec4<f32>,


nit: Maybe just spell out matrix or if this must be shortened go with mat? Just omitting the one letter doesn't seem that beneficial to me :)

BTW will it make sense to store a 3x3 matrix here in the future when we want to support real perspective? Or is it beneficial to keep the translation component decomposed in that world too?

It's a keyword in WGSL. And I don't know yet the answer to that question, it depends on how the math works out. An affine could be stored as a mat3x2f, but I chose not to do that so that the transform could be written without having to multiply by 1.

I'll go with mat as I do like conciseness.

armansito · 2023-10-06T19:33:19Z

shader/flatten.wgsl

+
+let MAX_QUADS = 16u;
+
+fn flatten_cubic(cubic: Cubic) {


I like the new organization of the curve eval and flattening code that used to be in path_coarse.

armansito · 2023-10-06T19:52:00Z

shader/path_count.wgsl

+
+        let dx = abs(s1.x - s0.x);
+        let dy = s1.y - s0.y;
+        if dx + dy == 0.0 {


Curious: should this check be against an epsilon? Would you mind pointing out in which earlier stage a zero length segment would have been culled (so I can better understand this exact comparison)?

No, it can't be a check against epsilon, as it might still cross a tile boundary, in which case it would affect winding numbers at higher levels of the subdivision hieararchy. This stuff is subtle.

It could be eliminated in LineSoup generation. That becomes an additional global invariant, while having it here is more local reasoning; letting it slip through would be pretty bad.

Also I took out the TODO on line 61, as that's been resolved.

armansito · 2023-10-06T20:00:50Z

shader/path_tiling.wgsl

+        let count = span(s0.x, s1.x) + span(s0.y, s1.y) - 1u;
+        let dx = abs(s1.x - s0.x);
+        let dy = s1.y - s0.y;
+        let idxdy = 1.0 / (dx + dy);


Maybe document that dx + dy cannot be zero and the division is valid because a zero-length line should have been discarded by path_count?

Mostly comments, but also a little cleanup of function signatures in fine.

Refer to Arxiv paper and mention bicyclic semigroups.

armansito approved these changes Oct 6, 2023

View reviewed changes

raphlinus added 2 commits October 6, 2023 15:21

Address review feedback

79d4fd9

Mostly comments, but also a little cleanup of function signatures in fine.

Add documentation of ClipBic

b46609d

Refer to Arxiv paper and mention bicyclic semigroups.

raphlinus merged commit 1ef6724 into main Oct 7, 2023
4 checks passed

raphlinus deleted the contig_tiling branch October 7, 2023 23:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contiguous storage for path segments #369

Contiguous storage for path segments #369

raphlinus commented Oct 6, 2023

armansito left a comment

armansito Oct 6, 2023

raphlinus Oct 6, 2023

raphlinus Oct 7, 2023

armansito Oct 6, 2023

raphlinus Oct 6, 2023

armansito Oct 6, 2023

raphlinus Oct 6, 2023

armansito Oct 6, 2023

raphlinus Oct 6, 2023

armansito Oct 6, 2023

raphlinus Oct 6, 2023

armansito Oct 6, 2023

raphlinus Oct 6, 2023

armansito Oct 6, 2023

armansito Oct 6, 2023

raphlinus Oct 6, 2023

armansito Oct 6, 2023

		let m = self.b.min(other.a);
		ClipBic::new(self.a + other.a - m, self.b + other.b - m)

Contiguous storage for path segments #369

Contiguous storage for path segments #369

Conversation

raphlinus commented Oct 6, 2023

armansito left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment