Removes the tiling and render task stack from clipping, draws clips in regions of interest. #685

glennw · 2017-01-05T04:41:16Z

This also ensures that primitive clip masks never render
any larger than the primitive bounding rect, which improves
the timings on GitHub a lot.

This does cause a slight performance regression on some
sites (GitHub in particular) but is a first step to the
planned clipping and tiling improvements coming. After those
land, the performance will be better than originally.

This change is

This also ensures that primitive clip masks never render any larger than the primitive bounding rect, which improves the timings on GitHub a lot. This does cause a slight performance regression on some sites (GitHub in particular) but is a first step to the planned clipping and tiling improvements coming. After those land, the performance will be better than originally.

glennw · 2017-01-05T04:42:21Z

@kvark This decouples the clip mask generation from the tiling system, as we discussed.

It also removes the (currently unused) chained render task support. I removed it to simplify future refactoring, since it's always available in git history if we need it again, but I can restore it if you'd prefer.

I marked this as [WIP] since it doesn't have any of the improved clip mask ideas we talked about at the work week yet - but it does pass all tests and only cause a small (temporary) performance regression on some sites. So we could perhaps merge this as a standalone PR, to keep the patches for this work a bit more manageable and easy to review.

I'm happy to get this merged after review, or just leave it as a PR and continue the refactoring work on top of this branch. What do you think?

For rectangles, this means the clip mask generation runs only on the four corners. For image masks, the mask generation shader runs only on the local rect of the image mask. The rectangle clip mask shader is also much faster now, since it only needs to consider one corner in the fragment shader. This drastically reduces the amount of time spent building clip masks. One remaining issue is that we now allocate large rectangles for large clip masks. This is a memory waste, but has (almost) no effect on performance. A follow up PR will use the z-buffer to draw these large primitives in segments, which will mean the memory allocation is no longer an issue, and also improve performance of the primitive shaders (by removing the clip rect check).

glennw · 2017-01-06T01:57:24Z

The 2nd commit makes the clip shader only run in the regions of interest for a clip mask. This makes clip mask generation time negligible on any sites I've tried (e.g. GH.com drops from 9ms to ~3ms on my test setup).

We still have the memory wastage for very large clip rects, but we have a plan to solve that - so I think that can be done as a follow up, since this patch is already large enough.

The clip performance is now better on each site I tested than it was previously.

r? @kvark

kvark

I like how you managed to do +238 −496 👍
Will need to make another review pass after the issues are addressed.

kvark · 2017-01-06T20:34:14Z

webrender/res/clip_shared.glsl

+    vec2 lp0_base = local_rect.xy;
+    vec2 lp1_base = local_rect.xy + local_rect.zw;
+
+    vec2 lp0 = clamp_rect(clamp_rect(lp0_base, local_rect),


no need to clamp_rect(lp0_base, local_rect) here.
In fact, you could just have left my lines untouched:

vec2 lp0 = clamp_rect(lp0_base, layer.local_clip_rect); vec2 lp1 = clamp_rect(lp1_base, layer.local_clip_rect);

kvark · 2017-01-06T20:45:12Z

webrender/res/clip_shared.glsl

    vec4 clipped_local_rect = vec4(lp0, lp1 - lp0);

-    vec2 final_pos = mix(area.task_bounds.xy, area.task_bounds.zw, aPosition.xy);
+    vec2 p0 = lp0;


I don't understand why we need all the transform logic back here. I assume it's related to this quote:

The 2nd commit makes the clip shader only run in the regions of interest for a clip mask. This makes clip mask generation time negligible on any sites I've tried (e.g. GH.com drops from 9ms to ~3ms on my test setup).

If a clip instance is only going to touch the pixels related to it, then it would not mark pixels outside of it as transparent, some of those pixels might have been written by the previous clip instances in the stack. Unless... we do something very clever with the depth/stencil buffer for the clip masks. (unfinished idea here)

kvark · 2017-01-06T20:53:27Z

webrender/res/cs_clip_rectangle.fs.glsl

-    float distance_from_border = dot(vec4(is_out),
-                                     max(vec4(0.0, 0.0, 0.0, 0.0), distances));
+    // TODO(gw): Support ellipse clip!
+    float d = (distance(pos, vClipRef) - vClipRadius.x + nudge) / pixels_per_fragment;


we have to ensure that 0 <= d <= 1

kvark · 2017-01-06T20:54:58Z

webrender/res/cs_clip_rectangle.glsl

@@ -7,4 +7,5 @@
 varying vec3 vPos;
 flat varying vec4 vLocalRect;
 flat varying vec4 vClipRect;
-flat varying vec4 vClipRadius;
+flat varying vec2 vClipRadius;
+flat varying vec2 vClipRef;


I wonder if we should combine some of the attributes, like here. It would reduce the code on our side as well as maybe a little fetch shader overhead (although clearly optimizable by the driver).

kvark · 2017-01-06T21:23:47Z

webrender/src/mask_cache.rs

+
+                for _ in 0..region.complex.length * CORNERS_PER_CLIP_REGION {
+                    mask.corner_components.push(CornerMaskComponent {
+                        gpu_address: clip_store.alloc(1),


dunno if it matters, but we could bulk-allocate here

Probably a good idea, I don't think it matters too much for now though.

kvark · 2017-01-06T21:28:20Z

webrender/src/mask_cache.rs

                    let data = ClipData::uniform(rect, radius);
-                    PrimitiveStore::populate_clip_data(slice, data);
-                    debug_assert_eq!(self.clip_range.item_count, 1);
+                    for (corner, component) in data.corners.iter().zip(self.corner_components.iter()) {


should we require the length of these vectors to match?

I think that's enforced by the debug assert above?

kvark · 2017-01-06T21:29:23Z

webrender/src/mask_cache.rs

                        local_rect = local_rect.and_then(|r| r.intersection(&clip.rect));
-                        local_inner = local_inner.and_then(|r| clip.get_inner_rect()


why are we removing the local_inner?

I figured it wasn't necessary now that the tiles are decoupled from the clip mask - but perhaps there is still a good reason to have it?

AFAIK, our earlier discussion was about having 4 clip rectangles generated instead of one so that the inner area gets excluded from the mask computation. Then we figured that these 4 clips can be just the corners of a rectangle if the rounded rectangle is all we got. So for that model, we'd still need the inner area calculated. However, I'm not sure you need it for your implementation. I'll do another review pass with that in mind.

This fixes a bug where there are parent clips from stacking contexts but no clip on the primitive itself. Previously, this would not be detected as requiring clipping during the batch generation.

glennw · 2017-01-09T06:02:41Z

@kvark I had an idea in mind to handle the outside clip case that we were discussing last week.

But then I went to create a test case for that problem, and was unable to create a demonstration of the issue - I think I've convinced myself it's not possible with the way the current do_clip() works, but I'm probably just missing something.

Let's discuss the exact problem again this week and come up with an example that demonstrates the issue, and then I'll add the solution for it to this PR.

kvark · 2017-01-09T16:04:44Z

@glennw If you are talking about the test you added to the sample, then I have an explanation. Since the clip logic works on the AABB of intersection of the clips, and your 2 clip rectangles are axis aligned, their intersection AABB is completely within each clip, so the change of VS logic that I consider incorrect did not make any difference for that specific case.

bors-servo · 2017-01-10T02:00:29Z

☔ The latest upstream changes (presumably #694) made this pull request unmergeable. Please resolve the merge conflicts.

glennw · 2017-01-10T22:09:20Z

Closing in favour of #696

glennw changed the title ~~[WIP] Removes the tiling and render task stack from clipping.~~ Removes the tiling and render task stack from clipping, draws clips in regions of interest. Jan 6, 2017

kvark self-requested a review January 6, 2017 20:31

kvark suggested changes Jan 6, 2017

View reviewed changes

gw3583 added 3 commits January 9, 2017 14:15

Store clip task in the primitive metadata, if required.

db5bd0d

This fixes a bug where there are parent clips from stacking contexts but no clip on the primitive itself. Previously, this would not be detected as requiring clipping during the batch generation.

Add sample with two complex clips on a single primitive.

21f87f6

Address some review comments.

a2fa692

glennw closed this Jan 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removes the tiling and render task stack from clipping, draws clips in regions of interest. #685

Removes the tiling and render task stack from clipping, draws clips in regions of interest. #685

glennw commented Jan 5, 2017 •

edited by larsbergstrom

Loading

glennw commented Jan 5, 2017

glennw commented Jan 6, 2017

kvark left a comment

kvark Jan 6, 2017

glennw Jan 9, 2017

kvark Jan 6, 2017

kvark Jan 6, 2017

glennw Jan 9, 2017

kvark Jan 6, 2017

glennw Jan 9, 2017

kvark Jan 6, 2017

glennw Jan 9, 2017

kvark Jan 6, 2017

glennw Jan 9, 2017

kvark Jan 9, 2017

kvark Jan 6, 2017

glennw Jan 9, 2017

kvark Jan 9, 2017

glennw commented Jan 9, 2017

kvark commented Jan 9, 2017

bors-servo commented Jan 10, 2017

glennw commented Jan 10, 2017

		local_rect = local_rect.and_then(\|r\| r.intersection(&clip.rect));
		local_inner = local_inner.and_then(\|r\| clip.get_inner_rect()

Removes the tiling and render task stack from clipping, draws clips in regions of interest. #685

Removes the tiling and render task stack from clipping, draws clips in regions of interest. #685

Conversation

glennw commented Jan 5, 2017 • edited by larsbergstrom Loading

glennw commented Jan 5, 2017

glennw commented Jan 6, 2017

kvark left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glennw commented Jan 9, 2017

kvark commented Jan 9, 2017

bors-servo commented Jan 10, 2017

glennw commented Jan 10, 2017

glennw commented Jan 5, 2017 •

edited by larsbergstrom

Loading