Difference between revisions of "延迟渲染"

Revision as of 02:56, 27 January 2011

本文讲述的是KlayGE的Deferred Rendering例子中使用的延迟渲染方法。

Deferred Lighting的框架

KlayGE 3.11的例子已经从Deferred Shading改成了更节省带宽的Deferred Lighting。这里先对Deferred Lighting作一个简要的介绍，并假设读者已经了解了Deferred Shading。

Deferred Lighting的渲染架构可以分为三个阶段：

1. for each object
   {
      填充G-Buffer
   }
2. for each light
   {
      Lighting pass
   }
3. for each object
   {
     执行shading
   }

与Deferred Shading不同的是，shading（也就是和材质相关）的计算仅仅发生在最后一个阶段。所以，G-Buffer中需要保存的信息得到极大地减小，甚至不再需要MRT。

Lighting pass

Lighting pass在Deferred Lighting框架处于核心地位，在这里我打算先把lighting pass解析清楚。一旦lighting pass表达好了，G-Buffer所需要保存的信息，以及shading pass能得到的信息也都清楚了。

基于物理的BRDF推出了渲染模型总公式：

$LaTeX: L_{o}(\mathbf{v})=\pi\rho(\mathbf{l_c}, \mathbf{v})\otimes \mathbf{c}_{light} (\mathbf{n} \cdot \mathbf{l_c})=(\mathbf{c}_{diff} + \frac {\alpha + 2} {8}(\mathbf{n} \cdot \mathbf{h})^{\alpha} F(\mathbf{c}_{spec}, \mathbf{l_c},\mathbf{h})) \otimes \mathbf{c}_{light} (\mathbf{n} \cdot \mathbf{l_c})$

再有N个光源的情况下，每个像素的光照响应就是

$LaTeX: L_{o}(\mathbf{v})=\pi\rho(\mathbf{l_{c1}}, \mathbf{v})\otimes \mathbf{c}_{light1} (\mathbf{n} \cdot \mathbf{l_{c1}})$

$LaTeX: +\pi\rho(\mathbf{l_{c2}}, \mathbf{v})\otimes \mathbf{c}_{light2} (\mathbf{n} \cdot \mathbf{l_{c2}})$

$LaTeX: + \ldots$

$LaTeX: +\pi\rho(\mathbf{l_cN}, \mathbf{v})\otimes \mathbf{c}_{lightN} (\mathbf{n} \cdot \mathbf{l_{cN}})$

对于Deferred shading来说，每一个shading pass就是执行一个

$LaTeX: \pi\rho(\mathbf{l_cn}, \mathbf{v})\otimes \mathbf{c}_{lightn} (\mathbf{n} \cdot \mathbf{l_cn})$

而对于Deferred lighting来说，公式需要重新整理一下：

$LaTeX: L_{o}(\mathbf{v})=(\mathbf{c}_{diff} + \frac {\alpha + 2} {8}(\mathbf{n} \cdot \mathbf{h_1})^{\alpha} F(\mathbf{c}_{spec}, \mathbf{l_{c1}},\mathbf{h_1})) \otimes \mathbf{c}_{light1} (\mathbf{n} \cdot \mathbf{l_{c1}})$

$LaTeX: +(\mathbf{c}_{diff} + \frac {\alpha + 2} {8}(\mathbf{n} \cdot \mathbf{h_2})^{\alpha} F(\mathbf{c}_{spec}, \mathbf{l_{c2}},\mathbf{h_2})) \otimes \mathbf{c}_{light2} (\mathbf{n} \cdot \mathbf{l_{c2}})$

$LaTeX: +\ldots$

$LaTeX: +(\mathbf{c}_{diff} + \frac {\alpha + 2} {8}(\mathbf{n} \cdot \mathbf{h_N})^{\alpha} F(\mathbf{c}_{spec}, \mathbf{l_{cN}},\mathbf{h_N})) \otimes \mathbf{c}_{lightN} (\mathbf{n} \cdot \mathbf{l_{cN}})$

$LaTeX: =\mathbf{c}_{diff}\otimes (\mathbf{c}_{light1} (\mathbf{n} \cdot \mathbf{l_{c1}}) + \mathbf{c}_{light2} (\mathbf{n} \cdot \mathbf{l_{c2}}) + \ldots + \mathbf{c}_{lightN} (\mathbf{n} \cdot \mathbf{l_{cN}}))$

$LaTeX: + \frac {\alpha + 2} {8}(((\mathbf{n} \cdot \mathbf{h_1})^{\alpha} F(\mathbf{c}_{spec}, \mathbf{l_{c1}},\mathbf{h_1})) \otimes \mathbf{c}_{light1} (\mathbf{n} \cdot \mathbf{l_{c1}})$

$LaTeX: + ((\mathbf{n} \cdot \mathbf{h_2})^{\alpha} F(\mathbf{c}_{spec}, \mathbf{l_{c2}},\mathbf{h_2})) \otimes \mathbf{c}_{light2} (\mathbf{n} \cdot \mathbf{l_{c2}})$

$LaTeX: + \ldots$

$LaTeX: + ((\mathbf{n} \cdot \mathbf{h_N})^{\alpha} F(\mathbf{c}_{spec}, \mathbf{l_{cN}},\mathbf{h_N})) \otimes \mathbf{c}_{lightN} (\mathbf{n} \cdot \mathbf{l_{cN}}))$

由于c_diff是到最后的shading pass才计算，所以在每一个light pass里面，diffuse和specular必须分开才能保证结果正确：

$LaTeX: Diffuse: \mathbf{c}_{lightn} (\mathbf{n} \cdot \mathbf{l_{cn}})$ $LaTeX: Specular: ((\mathbf{n} \cdot \mathbf{h_n})^{\alpha} F(\mathbf{c}_{spec}, \mathbf{l_{cn}},\mathbf{h_n})) \otimes \mathbf{c}_{lightn} (\mathbf{n} \cdot \mathbf{l_{cn}})$

为了把diffuse和specular放入4个通道的buffer中，就只能牺牲specular的颜色，只剩下亮度，同时c_spec也简化成一个标量。所以，lighting pass的计算成了：

$LaTeX: float4(1, 1, 1, (\mathbf{n} \cdot \mathbf{h_n})^{\alpha} F(c_{spec}, \mathbf{l_{cn}},\mathbf{h_n})) \times \mathbf{c}_{lightn} (\mathbf{n} \cdot \mathbf{l_{cn}})$

G-Buffer的分配

在Deferred框架中，不管是Deferred Shading还是Deferred Lighting，G-Buffer的分配都是非常关键的。前面得出的lighting pass公式如下：

$LaTeX: float4(1, 1, 1, (\mathbf{n} \cdot \mathbf{h_n})^{\alpha} F(c_{spec}, \mathbf{l_{cn}},\mathbf{h_n})) \times \mathbf{c}_{lightn} (\mathbf{n} \cdot \mathbf{l_{cn}})$

从公式可以看出，在light pass里需要的量有n，h，alpha，c_spec，l_c。因为h = (n + l_c) / 2（见基于物理的BRDF），而l_c = normalize(l - p)（l是光源位置，p是要计算的点位置），所以最终需要G-Buffer提供的量有：n，p，alpha和c_spec。要完整的保存这些量，一共需要8个通道，normal占3个，position占3个，alpha和c_spec分别占一个。这样对G-Buffer来说消耗太大了，必须要缩减。

显而易见的是，normal是经过归一化的，只需要保存2个分量。http://aras-p.info/texts/CompactNormalStorage.html比较了多种保存2分量的方法，其中Spheremap transform速度和效果综合起来最佳，Crytek也在用同样的方法，即：

float2 encode(float3 normal)
{
   return normalize(normal.xy) * sqrt(normal.z * 0.5 + 0.5);
}
float3 decode(float2 n)
{
   float3 normal;
   normal.z = dot(n, n) * 2 - 1;
   normal.xy = normalize(n) * sqrt(1 - normal.z * normal.z);
   return normal;
}

下一步是position。实际上像素所在的位置已经提供了x和y，需要保存的仅仅是z。position何以很好地从z和像素位置计算出来。这里保存的是view space的z除以far plane。在lighting pass，pixel shader里拿到像素在view space的位置之后，做这样的计算：

p = view_dir * ((z * far_plane) / view_dir.z);

其中，view_dir是在vertex shader中计算之后传到pixel shader。对于把光源的几何体直接作为光源几何的情况（如果你不熟悉这个，请见下篇），那么view_dir就是顶点乘上world * view矩阵之后的结果。对于用全屏的四边形作为光源几何的情况，view_dir就是把view frustum在far plane上的四个点乘上inverse(projection)矩阵之后的结果。z * far_plane就还原出了该点在view space的z，然后根据相似三角形的定理很容易就能推出这个还原公式。现在，position成功地压缩到了1个通道。

剩下的就是alpha和c_spec。如果不需要fresnel，可以直接忽略c_spec，留到shading pass再做，这里直接存alpha就可以了。否则，就需要把alpha和c_spec放入同一个通道。我用的方法是，floor(c_spec * 100)作为整数部分，clamp(alpha, 0, 255) / 256座位小数部分。这样的限制是，alpha取值范围为[0, 256)，一般来说够用了。

由此，所有lighting pass需要的信息都被压进4个通道内，G-Buffer只需要1张texture，省去了MRT。

Shading Pass

shading pass需要把前面所有lighting pass积累出来的光照信息和物体本身的材质信息组合起来，得出最后的着色。物体材质中的c_spec已经存在G-Buffer，并在lighting pass中计算了，所以shading pass输入的材质有c_diff，c_spec，c_emit，alpha。别忘了在前面的公式中，specular号需要乘上归一化系数(alpha + 2) / 8。另一方面，在lighting pass的结果里，rgb存的是积累的diffuse，a存的是积累的specular亮度，如果还有计算AO，那么shading所用的公式就是：

$LaTeX: \mathbf{c}_{emit} + (lighting.rgb * \mathbf{c}_{diff} + \frac{\alpha + 2}{8} * lighting.a) * ao$

如果在G-Buffer和lighting pass因为不考虑fresnel而至保存了alpha，那么shading pass的公式就变成：

$LaTeX: \mathbf{c}_{emit} + (lighting.rgb * \mathbf{c}_{diff} + \frac{\alpha + 2}{8} * \mathbf{c}_{spec} * lighting.a) * ao$

Light volume

在Deferred Rendering中，表示一个光源最简单的方法就是一个全屏的四边形。它能让G-Buffer的每一个pixel都参与计算，在pixel shader中才过滤掉多余的像素。虽然可以保证结果正确，但毕竟多余计算太多，效率不高。这里常用的一个优化就是用一个凸的几何形状来表示光源。该几何形状覆盖的pixel才计算该光源对它的贡献。显而易见的是，spot light用圆锥，point light用球或者立方体，directional light和ambient light用全屏四边形。下图画了一个spot light的volume：

Spot light volume

这样的几何体类似于古老的shadow volume技术所用的几何体，所以我把它叫做light volume。但由于light volume保证是凸几何体，在渲染上比shadow volume简单不少。

优化1：视锥检测

有了light volume，就可以把它和视锥做一个相交检测。light volume完全包住了light能覆盖的范围，所以如果一个light volume在视锥之外，这个光源就可以直接忽略。

优化2：Conditional Rendering

D3D10及以上的显卡都支持conditional rendering，基本用法是这样的：

BeginQuery()
Draw object with simple shader
EndQuery()
...
BeginConditionalRendering()
Draw object with real shader
EndConditionalRendering()

如果第一个Draw没有产生可见的像素，那么第二个Draw就会被忽略。与Occlusion query不同的是，在这个过程中不需要把query的结果返回CPU，流水线不会被打断，效率更高。用这种方法，就可以直接忽略掉不照亮任何一个pixel的光源。

优化3：Stencil Buffer

和shadow volume一样，这里可以用stencil buffer来标记出光源能找到的像素。实际上，在shadow volume上用的优化也可以照搬过来。比如说，双面stencil是最常用的一个方法，在一个pass内就能同时加减正反两面的stencil。同样，light volume也存在视点进入volume的问题，需要改变depth function，cull mode和back stencil pass。

优化4：Shadowing pass

KlayGE用shadow map渲染阴影。其生成shadow map的过程和普通方法一样，这里就不累赘了。在使用shadow map的时候有两个选择，以前的方法是在lighting pass里计算光照的时候就查询shadow map，同时计算阴影。另一个方法来自Screen space shadow map。在每个lighting pass之前加一个shadowing pass，仅仅查询shadow map和计算阴影本身（结果是个灰度图）。这样的好处是，shadowing可以在更低的分辨率上计算，而不用和lighting pass用同样的分辨率，提高效率。另外，shadowing pass的结果可以像screen space shadow map那样做一次blur，在让lighting pass使用。

@@ Line 1: / Line 1: @@
-本文讲述的是KlayGE的Deferred Rendering例子中使用的延迟渲染方法。
+本文讲述的是[[KlayGE]]的[[例子程序#Deferred Rendering|Deferred Rendering例子]]中使用的延迟渲染方法。
 == Deferred Lighting的框架 ==
-KlayGE 3.11的例子已经从Deferred Shading改成了更节省带宽的Deferred Lighting。这里先对Deferred Lighting作一个简要的介绍，并假设读者已经了解了Deferred Shading。
+[[KlayGE]] 3.11的例子已经从Deferred Shading改成了更节省带宽的Deferred Lighting。这里先对Deferred Lighting作一个简要的介绍，并假设读者已经了解了Deferred Shading。
 Deferred Lighting的渲染架构可以分为三个阶段：
@@ Line 114: / Line 114: @@
 <center><math>\mathbf{c}_{emit} + (lighting.rgb * \mathbf{c}_{diff} + \frac{\alpha + 2}{8} * \mathbf{c}_{spec} * lighting.a) * ao</math></center>
+== Light volume ==
+在Deferred Rendering中，表示一个光源最简单的方法就是一个全屏的四边形。它能让G-Buffer的每一个pixel都参与计算，在pixel shader中才过滤掉多余的像素。虽然可以保证结果正确，但毕竟多余计算太多，效率不高。这里常用的一个优化就是用一个凸的几何形状来表示光源。该几何 形状覆盖的pixel才计算该光源对它的贡献。显而易见的是，spot light用圆锥，point light用球或者立方体，directional light和ambient light用全屏四边形。下图画了一个spot light的volume：
+[[File:spot_volume.jpg|400px|thumb|center|Spot light volume]]
+这样的几何体类似于古老的shadow volume技术所用的几何体，所以我把它叫做light volume。但由于light volume保证是凸几何体，在渲染上比shadow volume简单不少。
+=== 优化1：视锥检测 ===
+有了light volume，就可以把它和视锥做一个相交检测。light volume完全包住了light能覆盖的范围，所以如果一个light volume在视锥之外，这个光源就可以直接忽略。
+=== 优化2：Conditional Rendering ===
+D3D10及以上的显卡都支持conditional rendering，基本用法是这样的：
+ BeginQuery()
+ Draw object with simple shader
+ EndQuery()
+ ...
+ BeginConditionalRendering()
+ Draw object with real shader
+ EndConditionalRendering()
+如果第一个Draw没有产生可见的像素，那么第二个Draw就会被忽略。与Occlusion query不同的是，在这个过程中不需要把query的结果返回CPU，流水线不会被打断，效率更高。用这种方法，就可以直接忽略掉不照亮任何一个pixel的光源。
+=== 优化3：Stencil Buffer ===
+和shadow volume一样，这里可以用stencil buffer来标记出光源能找到的像素。实际上，在shadow volume上用的优化也可以照搬过来。比如说，双面stencil是最常用的一个方法，在一个pass内就能同时加减正反两面的stencil。同 样，light volume也存在视点进入volume的问题，需要改变depth function，cull mode和back stencil pass。
+=== 优化4：Shadowing pass ===
+[[KlayGE]]用shadow map渲染阴影。其生成shadow map的过程和普通方法一样，这里就不累赘了。在使用shadow map的时候有两个选择，以前的方法是在lighting pass里计算光照的时候就查询shadow map，同时计算阴影。另一个方法来自Screen space shadow map。在每个lighting pass之前加一个shadowing pass，仅仅查询shadow map和计算阴影本身（结果是个灰度图）。这样的好处是，shadowing可以在更低的分辨率上计算，而不用和lighting pass用同样的分辨率，提高效率。另外，shadowing pass的结果可以像screen space shadow map那样做一次blur，在让lighting pass使用。

Difference between revisions of "延迟渲染"

Revision as of 02:56, 27 January 2011

Contents

Deferred Lighting的框架

Lighting pass

G-Buffer的分配

Shading Pass

Light volume

优化1：视锥检测

优化2：Conditional Rendering

优化3：Stencil Buffer

优化4：Shadowing pass

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools