上一篇把资源转成了用12的设备建立,下一步有点犯难了。
让我们做一个假设。要让一个最基本的系统能渲染起来,换句话说,关掉所有post process、UI、文字,就渲染一个三角形。至少需要一个vertex buffer、一个rtv、一个vs、一个ps、一次clear、一次draw call。Vertex buffer的问题已经解决;vs和ps本身和11的一样;clear和11的几乎一样,只要改成调用graphics command list上的函数即可;draw call也是。好了,那么问题就集中在
- 如何使用rtv;
- 如何组装起来渲染
但无论如何,这都是不归路。我们只能向前走,再也没法像前面那样,用11on12来让两者交互使用。
RTV
D3D12里的RTV是放在一个heap里的,使用的时候把heap里的一个handle和一个render target view的结构绑定起来。之后就能像D3D11的RTV那样使用这个handle了。这里有个需要注意的地方,程序里只能有一个rtv heap,至少我还没找到可以有多个的方法。以至于我现在只能建立一个很大的heap,用一个free list来管理。但需要一个RTV时,就从中分配一个。
建立heap的方法如下:
D3D12_DESCRIPTOR_HEAP_DESC rtv_desc_heap; rtv_desc_heap.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; rtv_desc_heap.NumDescriptors = NUM_BACK_BUFFERS * 2 + NUM_MAX_RENDER_TARGET_VIEWS; rtv_desc_heap.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; rtv_desc_heap.NodeMask = 0; ID3D12DescriptorHeap* rtv_descriptor_heap; d3d_device->CreateDescriptorHeap(&rtv_desc_heap, IID_ID3D12DescriptorHeap, reinterpret_cast<void**>(&rtv_descriptor_heap)); rtv_desc_size = d3d_device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_RTV);
其中NUM_MAX_RENDER_TARGET_VIEWS就是估计整个程序最多会同时用到多少个RTV。rtv_desc_size是handle的大小,后面就可以用
rtv_descriptor_heap->GetCPUDescriptorHandleForHeapStart() + i * rtv_desc_size
来拿到第i个handle。
DSV也同理可以这么构造。
PSO
Pipeline state object(pso)代替了以前的多个不同的流水线状态,比如shader、blend state、rasterize sate、input layout等。也就是说,以前分开设置的几个阶段,现在都先放到pso里,再进一步通过SetPipelineState设置给command list。从D3D9的每个状态都需要调用一次API设置一遍,到D3D10/11的把状态根据阶段分成几组,再到D3D12的所有状态都合并入pso,API的设计都趋向于更接近硬件需求,已达到更高的性能。
这里的pso是和effect pass相对应的。每个pass里带有pso需要的所有信息。以后可以进一步考虑在pass之间共享pso。
D3D12_GRAPHICS_PIPELINE_STATE_DESC pso_desc; pso_desc.pRootSignature = so->RootSignature().get(); { auto const & blob = so->ShaderBlob(ShaderObject::ST_VertexShader); if (blob && !blob->empty()) { pso_desc.VS.pShaderBytecode = blob->data(); pso_desc.VS.BytecodeLength = static_cast<UINT>(blob->size()); } else { pso_desc.VS.pShaderBytecode = nullptr; pso_desc.VS.BytecodeLength = 0; } } { auto const & blob = so->ShaderBlob(ShaderObject::ST_PixelShader); if (blob && !blob->empty()) { pso_desc.PS.pShaderBytecode = blob->data(); pso_desc.PS.BytecodeLength = static_cast<UINT>(blob->size()); } else { pso_desc.PS.pShaderBytecode = nullptr; pso_desc.PS.BytecodeLength = 0; } } { auto const & blob = so->ShaderBlob(ShaderObject::ST_DomainShader); if (blob && !blob->empty()) { pso_desc.DS.pShaderBytecode = blob->data(); pso_desc.DS.BytecodeLength = static_cast<UINT>(blob->size()); } else { pso_desc.DS.pShaderBytecode = nullptr; pso_desc.DS.BytecodeLength = 0; } } { auto const & blob = so->ShaderBlob(ShaderObject::ST_HullShader); if (blob && !blob->empty()) { pso_desc.HS.pShaderBytecode = blob->data(); pso_desc.HS.BytecodeLength = static_cast<UINT>(blob->size()); } else { pso_desc.HS.pShaderBytecode = nullptr; pso_desc.HS.BytecodeLength = 0; } } { auto const & blob = so->ShaderBlob(ShaderObject::ST_GeometryShader); if (blob && !blob->empty()) { pso_desc.GS.pShaderBytecode = blob->data(); pso_desc.GS.BytecodeLength = static_cast<UINT>(blob->size()); } else { pso_desc.GS.pShaderBytecode = nullptr; pso_desc.GS.BytecodeLength = 0; } } auto const & so_decls = so->SODecl(); std::vector<UINT> so_strides(so_decls.size()); for (size_t i = 0; i < so_decls.size(); ++ i) { so_strides[i] = so_decls[i].ComponentCount * sizeof(float); } pso_desc.StreamOutput.pSODeclaration = so_decls.empty() ? nullptr : &so_decls[0]; pso_desc.StreamOutput.NumEntries = static_cast<UINT>(so_decls.size()); pso_desc.StreamOutput.pBufferStrides = so_strides.empty() ? nullptr : &so_strides[0]; pso_desc.StreamOutput.NumStrides = static_cast<UINT>(so_strides.size()); pso_desc.StreamOutput.RasterizedStream = so->RasterizedStream(); pso_desc.BlendState = checked_pointer_cast<D3D12BlendStateObject>(pass->GetBlendStateObject())->D3DDesc(); pso_desc.SampleMask = sample_mask_cache_; pso_desc.RasterizerState = checked_pointer_cast<D3D12RasterizerStateObject>(pass->GetRasterizerStateObject())->D3DDesc(); pso_desc.DepthStencilState = checked_pointer_cast<D3D12DepthStencilStateObject>(pass->GetDepthStencilStateObject())->D3DDesc(); pso_desc.InputLayout.pInputElementDescs = d3d12_rl.InputElementDesc().empty() ? nullptr : &d3d12_rl.InputElementDesc()[0]; pso_desc.InputLayout.NumElements = static_cast<UINT>(d3d12_rl.InputElementDesc().size()); pso_desc.IBStripCutValue = (EF_R16UI == rl.IndexStreamFormat()) ? D3D12_INDEX_BUFFER_STRIP_CUT_VALUE_0xFFFF : D3D12_INDEX_BUFFER_STRIP_CUT_VALUE_0xFFFFFFFF; RenderLayout::topology_type tt = rl.TopologyType(); if (tech.HasTessellation()) { switch (tt) { case RenderLayout::TT_PointList: tt = RenderLayout::TT_1_Ctrl_Pt_PatchList; break; case RenderLayout::TT_LineList: tt = RenderLayout::TT_2_Ctrl_Pt_PatchList; break; case RenderLayout::TT_TriangleList: tt = RenderLayout::TT_3_Ctrl_Pt_PatchList; break; default: break; } } pso_desc.PrimitiveTopologyType = D3D12Mapping::MappingPriTopoType(tt); pso_desc.NumRenderTargets = 0; FrameBufferPtr const & fb = this->CurFrameBuffer(); for (int i = 7; i >= 0; -- i) { if (fb->Attached(FrameBuffer::ATT_Color0 + i)) { pso_desc.NumRenderTargets = i + 1; break; } } for (uint32_t i = 0; i < pso_desc.NumRenderTargets; ++ i) { pso_desc.RTVFormats[i] = D3D12Mapping::MappingFormat(fb->Attached(FrameBuffer::ATT_Color0 + i)->Format()); } for (uint32_t i = pso_desc.NumRenderTargets; i < sizeof(pso_desc.RTVFormats) / sizeof(pso_desc.RTVFormats[0]); ++ i) { pso_desc.RTVFormats[i] = DXGI_FORMAT_UNKNOWN; } if (fb->Attached(FrameBuffer::ATT_DepthStencil)) { pso_desc.DSVFormat = D3D12Mapping::MappingFormat(fb->Attached(FrameBuffer::ATT_DepthStencil)->Format()); } else { pso_desc.DSVFormat = DXGI_FORMAT_UNKNOWN; } pso_desc.SampleDesc.Count = 1; pso_desc.SampleDesc.Quality = 0; pso_desc.NodeMask = 0; pso_desc.CachedPSO.pCachedBlob = nullptr; pso_desc.CachedPSO.CachedBlobSizeInBytes = 0; pso_desc.Flags = D3D12_PIPELINE_STATE_FLAG_NONE; ID3D12PipelineState* d3d_pso; d3d_device->CreateGraphicsPipelineState(&pso_desc, IID_ID3D12PipelineState, reinterpret_cast<void**>(&d3d_pso));
就这么一气呵成。
其中root signature表示的是constant buffer、texture、samplers等各种资源的“布局”。它和资源本身无关,只是表述每个阶段的资源个数。所以可以以shader object为单位建立,理论上可以在shader object之间共享(shader object是KlayGE里用来描述一组shader的对象。VS、PS、GS等离散的shader先绑到shader object再使用,类似OpenGL中的program)。
总结
实现了RTV和PSO之后,其实D3D12的插件已经可以渲染最初级的三角形。但要渲染哪怕稍微复杂一点的物体,就仍需要11on12。所以目前是个共存的阶段。
接下去,我会进一步加入SRV、sampler、CBV等要素。
Comments