-
Notifications
You must be signed in to change notification settings - Fork 288
Removes the tiling and render task stack from clipping, draws clips in regions of interest. #685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This also ensures that primitive clip masks never render any larger than the primitive bounding rect, which improves the timings on GitHub a lot. This does cause a slight performance regression on some sites (GitHub in particular) but is a first step to the planned clipping and tiling improvements coming. After those land, the performance will be better than originally.
@kvark This decouples the clip mask generation from the tiling system, as we discussed. It also removes the (currently unused) chained render task support. I removed it to simplify future refactoring, since it's always available in git history if we need it again, but I can restore it if you'd prefer. I marked this as [WIP] since it doesn't have any of the improved clip mask ideas we talked about at the work week yet - but it does pass all tests and only cause a small (temporary) performance regression on some sites. So we could perhaps merge this as a standalone PR, to keep the patches for this work a bit more manageable and easy to review. I'm happy to get this merged after review, or just leave it as a PR and continue the refactoring work on top of this branch. What do you think? |
For rectangles, this means the clip mask generation runs only on the four corners. For image masks, the mask generation shader runs only on the local rect of the image mask. The rectangle clip mask shader is also much faster now, since it only needs to consider one corner in the fragment shader. This drastically reduces the amount of time spent building clip masks. One remaining issue is that we now allocate large rectangles for large clip masks. This is a memory waste, but has (almost) no effect on performance. A follow up PR will use the z-buffer to draw these large primitives in segments, which will mean the memory allocation is no longer an issue, and also improve performance of the primitive shaders (by removing the clip rect check).
The 2nd commit makes the clip shader only run in the regions of interest for a clip mask. This makes clip mask generation time negligible on any sites I've tried (e.g. GH.com drops from 9ms to ~3ms on my test setup). We still have the memory wastage for very large clip rects, but we have a plan to solve that - so I think that can be done as a follow up, since this patch is already large enough. The clip performance is now better on each site I tested than it was previously. r? @kvark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how you managed to do +238 −496
👍
Will need to make another review pass after the issues are addressed.
vec2 lp0_base = local_rect.xy; | ||
vec2 lp1_base = local_rect.xy + local_rect.zw; | ||
|
||
vec2 lp0 = clamp_rect(clamp_rect(lp0_base, local_rect), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to clamp_rect(lp0_base, local_rect)
here.
In fact, you could just have left my lines untouched:
vec2 lp0 = clamp_rect(lp0_base, layer.local_clip_rect);
vec2 lp1 = clamp_rect(lp1_base, layer.local_clip_rect);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
vec4 clipped_local_rect = vec4(lp0, lp1 - lp0); | ||
|
||
vec2 final_pos = mix(area.task_bounds.xy, area.task_bounds.zw, aPosition.xy); | ||
vec2 p0 = lp0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we need all the transform logic back here. I assume it's related to this quote:
The 2nd commit makes the clip shader only run in the regions of interest for a clip mask. This makes clip mask generation time negligible on any sites I've tried (e.g. GH.com drops from 9ms to ~3ms on my test setup).
If a clip instance is only going to touch the pixels related to it, then it would not mark pixels outside of it as transparent, some of those pixels might have been written by the previous clip instances in the stack. Unless... we do something very clever with the depth/stencil buffer for the clip masks. (unfinished idea here)
float distance_from_border = dot(vec4(is_out), | ||
max(vec4(0.0, 0.0, 0.0, 0.0), distances)); | ||
// TODO(gw): Support ellipse clip! | ||
float d = (distance(pos, vClipRef) - vClipRadius.x + nudge) / pixels_per_fragment; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have to ensure that 0 <= d <= 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -7,4 +7,5 @@ | |||
varying vec3 vPos; | |||
flat varying vec4 vLocalRect; | |||
flat varying vec4 vClipRect; | |||
flat varying vec4 vClipRadius; | |||
flat varying vec2 vClipRadius; | |||
flat varying vec2 vClipRef; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should combine some of the attributes, like here. It would reduce the code on our side as well as maybe a little fetch shader overhead (although clearly optimizable by the driver).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
for _ in 0..region.complex.length * CORNERS_PER_CLIP_REGION { | ||
mask.corner_components.push(CornerMaskComponent { | ||
gpu_address: clip_store.alloc(1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dunno if it matters, but we could bulk-allocate here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good idea, I don't think it matters too much for now though.
let data = ClipData::uniform(rect, radius); | ||
PrimitiveStore::populate_clip_data(slice, data); | ||
debug_assert_eq!(self.clip_range.item_count, 1); | ||
for (corner, component) in data.corners.iter().zip(self.corner_components.iter()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we require the length of these vectors to match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's enforced by the debug assert above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which one?
local_rect = local_rect.and_then(|r| r.intersection(&clip.rect)); | ||
local_inner = local_inner.and_then(|r| clip.get_inner_rect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we removing the local_inner
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured it wasn't necessary now that the tiles are decoupled from the clip mask - but perhaps there is still a good reason to have it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, our earlier discussion was about having 4 clip rectangles generated instead of one so that the inner area gets excluded from the mask computation. Then we figured that these 4 clips can be just the corners of a rectangle if the rounded rectangle is all we got. So for that model, we'd still need the inner area calculated. However, I'm not sure you need it for your implementation. I'll do another review pass with that in mind.
This fixes a bug where there are parent clips from stacking contexts but no clip on the primitive itself. Previously, this would not be detected as requiring clipping during the batch generation.
@kvark I had an idea in mind to handle the outside clip case that we were discussing last week. But then I went to create a test case for that problem, and was unable to create a demonstration of the issue - I think I've convinced myself it's not possible with the way the current do_clip() works, but I'm probably just missing something. Let's discuss the exact problem again this week and come up with an example that demonstrates the issue, and then I'll add the solution for it to this PR. |
@glennw If you are talking about the test you added to the sample, then I have an explanation. Since the clip logic works on the AABB of intersection of the clips, and your 2 clip rectangles are axis aligned, their intersection AABB is completely within each clip, so the change of VS logic that I consider incorrect did not make any difference for that specific case. |
☔ The latest upstream changes (presumably #694) made this pull request unmergeable. Please resolve the merge conflicts. |
Closing in favour of #696 |
This also ensures that primitive clip masks never render
any larger than the primitive bounding rect, which improves
the timings on GitHub a lot.
This does cause a slight performance regression on some
sites (GitHub in particular) but is a first step to the
planned clipping and tiling improvements coming. After those
land, the performance will be better than originally.
This change is