opengl - Frequency of shader invocations in rendering commands -
shaders have invocations, each (usually) given unique set of input data, , each (usually) write own separate output data. when issue rendering command, how many times each shader invoked?
each shader stage has own frequency of invocations. use opengl terminology, d3d works same way (since they're both modelling same hardware relationships).
vertex shaders
these second complicated. execute once every input vertex... kinda. if using non-indexed rendering, ratio 1:1. every input vertex execute on separate vertex shader instance.
if using indexed rendering, gets complicated. it's more-or-less 1:1, each vertex having own vs invocation. however, post-t&l caching, possible vertex shader executed less once per input vertex.
see, vertex shader's execution assumed create 1:1 mapping between input vertex data , output vertex data. means if pass identical input data vertex shader (in same rendering command), expected generate identical output data. if hardware can detect execute vertex shader on same input data has used previously, can skip execution , use outputs previous execution.
hardware detects using vertex's index (which why doesn't work non-indexed rendering). if same index provided vertex shader, assumed shader of same input values, , therefore generate same output values. hardware cache output values based on indices. if index in post-t&l cache, hardware skip vs's execution , use output values.
instancing complicates post-t&l caching slightly. rather caching solely on vertex index, caches based on index , instance id. uses cached data if both values same.
so generally, vs's execute once every vertex, if optimize geometry indexed data, can execute fewer times. much fewer, depending on how it.
tessellation control shaders
or hull shaders in d3d parlance.
the tcs simple in regard. execute once each vertex in each patch of rendering command. no caching or other optimizations done here.
tessellation evaluation shaders
or domain shaders in d3d parlance.
the tes executes after tessellation primitive generator has generated new vertices. because of that, how executes depend on tessellation parameters.
the tes takes vertices generated tessellator , outputs vertices. in 1:1 ratio.
but similar vertex shaders, not necessarly 1:1 each vertex in each of output primitives. vs, tes assumed provide direct 1:1 mapping between locations in tessellated primitives , output parameters. if invoke tes multiple times same patch location, expected output same value.
as such, if generated primitives share vertices, tes invoked once such shared vertices. unlike vertex shaders, have no control on how hardware utilize this. best can hope generation algorithm smart enough minimize how calls tes.
geometry shaders
a geometry shader invoked once each point, line or triangle primitive, either directly given rendering command or generated tessellator. if render 6 vertices unconnected lines, gs invoked 3 times.
each gs invocation can generate 0 or more primitives output.
the gs can use instancing internally (in opengl 4.0 or direct3d 11). means that, each primitive reaches gs, gs invoked x times, x number of gs instances. each such invocation same input primitive data (with special input value used distinguish between such instances). useful more efficiently directing primitives different layers of layered framebuffers.
fragment shaders
or pixel shaders in d3d parlance. though aren't pixels yet, may not become pixels, , can executed multiple times same pixel ;)
these complicated regard invocation frequency. how execute depends on lot of things.
fs's must executed @ least once each pixel-sized area primitive rasterizes to. may executed more that.
in order compute derivatives texture functions, 1 fs invocation borrow values 1 of neighboring invocation. problematic if there is no such invocation, if neighbor falls outside of boundary of primitive being rasterized.
in such cases, there still neighboring fs invocation. though produces no actual data, still exists , still work. part these helper invocations don't hurt performance. they're using shader resources have otherwise gone unusued. also, attempt such helper invocations output data ignored system.
but still technically exist.
a less transparent issue revolves around multisampling. see, multisampling implementations (particularly in opengl) allowed decide on own how many fs invocations issue. while there ways force multisampled rendering create fs invocation every sample, there no guarantee implementations execute fs once per covered pixel outside of these cases.
for example, if recall correctly, if create multisample image high sample count on nvidia hardware (8 16 or that), hardware may decide execute fs multiple times. not once per sample, once every 4 samples or so.
so how many fs invocations get? @ least 1 every pixel-sized area covered primitive being rasterized. possibly more if you're doing multisampled rendering.
compute shaders
the exact number of invocations specify. no more, no less.
Comments
Post a Comment