Skip to content

Removes the tiling and render task stack from clipping, draws clips in regions of interest. #685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

glennw
Copy link
Member

@glennw glennw commented Jan 5, 2017

This also ensures that primitive clip masks never render
any larger than the primitive bounding rect, which improves
the timings on GitHub a lot.

This does cause a slight performance regression on some
sites (GitHub in particular) but is a first step to the
planned clipping and tiling improvements coming. After those
land, the performance will be better than originally.


This change is Reviewable

This also ensures that primitive clip masks never render
any larger than the primitive bounding rect, which improves
the timings on GitHub a lot.

This does cause a slight performance regression on some
sites (GitHub in particular) but is a first step to the
planned clipping and tiling improvements coming. After those
land, the performance will be better than originally.
@glennw
Copy link
Member Author

glennw commented Jan 5, 2017

@kvark This decouples the clip mask generation from the tiling system, as we discussed.

It also removes the (currently unused) chained render task support. I removed it to simplify future refactoring, since it's always available in git history if we need it again, but I can restore it if you'd prefer.

I marked this as [WIP] since it doesn't have any of the improved clip mask ideas we talked about at the work week yet - but it does pass all tests and only cause a small (temporary) performance regression on some sites. So we could perhaps merge this as a standalone PR, to keep the patches for this work a bit more manageable and easy to review.

I'm happy to get this merged after review, or just leave it as a PR and continue the refactoring work on top of this branch. What do you think?

For rectangles, this means the clip mask generation runs only on the
four corners. For image masks, the mask generation shader runs only
on the local rect of the image mask.

The rectangle clip mask shader is also much faster now, since it
only needs to consider one corner in the fragment shader.

This drastically reduces the amount of time spent building clip masks.

One remaining issue is that we now allocate large rectangles for large
clip masks. This is a memory waste, but has (almost) no effect on
performance. A follow up PR will use the z-buffer to draw these
large primitives in segments, which will mean the memory allocation
is no longer an issue, and also improve performance of the primitive
shaders (by removing the clip rect check).
@glennw glennw changed the title [WIP] Removes the tiling and render task stack from clipping. Removes the tiling and render task stack from clipping, draws clips in regions of interest. Jan 6, 2017
@glennw
Copy link
Member Author

glennw commented Jan 6, 2017

The 2nd commit makes the clip shader only run in the regions of interest for a clip mask. This makes clip mask generation time negligible on any sites I've tried (e.g. GH.com drops from 9ms to ~3ms on my test setup).

We still have the memory wastage for very large clip rects, but we have a plan to solve that - so I think that can be done as a follow up, since this patch is already large enough.

The clip performance is now better on each site I tested than it was previously.

r? @kvark

@kvark kvark self-requested a review January 6, 2017 20:31
Copy link
Member

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you managed to do +238 −496 👍
Will need to make another review pass after the issues are addressed.

vec2 lp0_base = local_rect.xy;
vec2 lp1_base = local_rect.xy + local_rect.zw;

vec2 lp0 = clamp_rect(clamp_rect(lp0_base, local_rect),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to clamp_rect(lp0_base, local_rect) here.
In fact, you could just have left my lines untouched:

    vec2 lp0 = clamp_rect(lp0_base, layer.local_clip_rect);
    vec2 lp1 = clamp_rect(lp1_base, layer.local_clip_rect);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

vec4 clipped_local_rect = vec4(lp0, lp1 - lp0);

vec2 final_pos = mix(area.task_bounds.xy, area.task_bounds.zw, aPosition.xy);
vec2 p0 = lp0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we need all the transform logic back here. I assume it's related to this quote:

The 2nd commit makes the clip shader only run in the regions of interest for a clip mask. This makes clip mask generation time negligible on any sites I've tried (e.g. GH.com drops from 9ms to ~3ms on my test setup).

If a clip instance is only going to touch the pixels related to it, then it would not mark pixels outside of it as transparent, some of those pixels might have been written by the previous clip instances in the stack. Unless... we do something very clever with the depth/stencil buffer for the clip masks. (unfinished idea here)

float distance_from_border = dot(vec4(is_out),
max(vec4(0.0, 0.0, 0.0, 0.0), distances));
// TODO(gw): Support ellipse clip!
float d = (distance(pos, vClipRef) - vClipRadius.x + nudge) / pixels_per_fragment;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have to ensure that 0 <= d <= 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -7,4 +7,5 @@
varying vec3 vPos;
flat varying vec4 vLocalRect;
flat varying vec4 vClipRect;
flat varying vec4 vClipRadius;
flat varying vec2 vClipRadius;
flat varying vec2 vClipRef;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should combine some of the attributes, like here. It would reduce the code on our side as well as maybe a little fetch shader overhead (although clearly optimizable by the driver).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


for _ in 0..region.complex.length * CORNERS_PER_CLIP_REGION {
mask.corner_components.push(CornerMaskComponent {
gpu_address: clip_store.alloc(1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dunno if it matters, but we could bulk-allocate here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good idea, I don't think it matters too much for now though.

let data = ClipData::uniform(rect, radius);
PrimitiveStore::populate_clip_data(slice, data);
debug_assert_eq!(self.clip_range.item_count, 1);
for (corner, component) in data.corners.iter().zip(self.corner_components.iter()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we require the length of these vectors to match?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's enforced by the debug assert above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which one?

local_rect = local_rect.and_then(|r| r.intersection(&clip.rect));
local_inner = local_inner.and_then(|r| clip.get_inner_rect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we removing the local_inner?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured it wasn't necessary now that the tiles are decoupled from the clip mask - but perhaps there is still a good reason to have it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, our earlier discussion was about having 4 clip rectangles generated instead of one so that the inner area gets excluded from the mask computation. Then we figured that these 4 clips can be just the corners of a rectangle if the rounded rectangle is all we got. So for that model, we'd still need the inner area calculated. However, I'm not sure you need it for your implementation. I'll do another review pass with that in mind.

gw3583 added 3 commits January 9, 2017 14:15
This fixes a bug where there are parent clips from stacking contexts
but no clip on the primitive itself. Previously, this would not be
detected as requiring clipping during the batch generation.
@glennw
Copy link
Member Author

glennw commented Jan 9, 2017

@kvark I had an idea in mind to handle the outside clip case that we were discussing last week.

But then I went to create a test case for that problem, and was unable to create a demonstration of the issue - I think I've convinced myself it's not possible with the way the current do_clip() works, but I'm probably just missing something.

Let's discuss the exact problem again this week and come up with an example that demonstrates the issue, and then I'll add the solution for it to this PR.

@kvark
Copy link
Member

kvark commented Jan 9, 2017

@glennw If you are talking about the test you added to the sample, then I have an explanation. Since the clip logic works on the AABB of intersection of the clips, and your 2 clip rectangles are axis aligned, their intersection AABB is completely within each clip, so the change of VS logic that I consider incorrect did not make any difference for that specific case.

@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #694) made this pull request unmergeable. Please resolve the merge conflicts.

@glennw
Copy link
Member Author

glennw commented Jan 10, 2017

Closing in favour of #696

@glennw glennw closed this Jan 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants