GeForce RTX and AMD’s forthcoming graphics architecture that will power the future of PC and console gaming, will support the new DirectX 12 Ultimate API Microsoft announced today. DX Ultimate basically means support for DirectX Raytracing, Variable Rate Shading, Mesh Shaders and Sampler Feedback.
DirectX 12 Ultimate will drive the next generation of games and will enable a new level of realism with four key features – DirectX Raytracing (DXR), Variable Rate Shading (VRS), Mesh Shaders and Sampler Feedback – and AMD worked closely with Microsoft to deliver awe-inspiring experiences with RDNA 2-based graphics. AMD and NVIDIA collaborated with Microsoft on the design of DXR 1.1, an update to DXR that can deliver better efficiency and performance in many raytracing effects. With DirectX 12 Ultimate, advanced effects such as raytracing are expected to reach more games sooner, and it makes developers’ lives easier by allowing them to create games using the same common graphics API and graphics architecture for both PCs and consoles.
Microsoft’s Game Stack exists to bring developers the tools they need to create bold, immersive game experiences, and DX12 Ultimate is the ideal tool to amplify gaming graphics. DX12 Ultimate is the result of continual investment in the DirectX 12 platform made over the last five years to ensure that Xbox and Windows 10 remain at the very pinnacle of graphics technology. To further empower game developers to create games with stunning visuals, we enhanced features that are already beginning to transform gaming such as DirectX Raytracing and Variable Rate Shading, and have added new major features such as Mesh Shaders and Sampler Feedback. Together, these features represent many years of innovation from Microsoft and our partners in the hardware industry. DX12 Ultimate brings them all together in one common bundle, providing developers with a single key to unlock next generation graphics on PC and Xbox Series X. Of course, even the most powerful features are of limited use without the tools necessary to fully exploit them, so we are pleased to announce that our industry-leading PIX graphics optimization tool and our open-source HLSL compiler will provide game developers with the ability to squeeze every last drop of performance out of an entire ecosystem of DX12 Ultimate hardware.
You can learn more about DirectX 12 Ultimate in Microsoft’s blog. For more information about how our hardware partners are supporting DX12 Ultimate, see these articles by AMD (read) and NVIDIA (read):
-- Microsoft Blog on DirectX 12 Ultimate--
DirectX Raytracing 1.1
DirectX Raytracing (DXR) brings a new level of graphics realism to video games, previously only achievable in the movie industry. The effects achievable by DXR feel more real, because in a sense they are more real: DXR traces paths of light with true-to-life physics calculations, which is a far more accurate simulation than the heuristics based calculations used previously. We’ve already seen an unprecedented level of visual quality from titles that use DXR 1.0 since we unveiled it, and built DXR 1.1 in response to developer feedback, giving them even more tools with which to utilize DXR. DXR 1.1 is an incremental addition over the top of DXR 1.0, adding three major new capabilities:
- GPU Work Creation now allows Raytracing. This enables shaders on the GPU to invoke raytracing without an intervening round-trip back to the CPU. This ability is useful for adaptive raytracing scenarios like shader-based culling / sorting / classification / refinement. Basically, scenarios that prepare raytracing work on the GPU and then immediately spawn it.
- Streaming engines can more efficiently load new raytracing shaders as needed when the player moves around the world and new objects become visible.
- Inline raytracing is an alternative form of raytracing that gives developers the option to drive more of the raytracing process, as opposed to handling work scheduling entirely to the system (dynamic-shading). It is available in any shader stage, including compute shaders, pixel shaders etc. Both the dynamic-shading and inline forms of raytracing use the same opaque acceleration structures.
When to use inline raytracing
Inline raytracing can be useful for many reasons:
- Perhaps the developer knows their scenario is simple enough that the overhead of dynamic shader scheduling is not worthwhile. For example, a well constrained way of calculating shadows.
- It could be convenient/efficient to query an acceleration structure from a shader that doesn’t support dynamic-shader-based rays. Like a compute shader or pixel shader.
- It might be helpful to combine dynamic-shader-based raytracing with the inline form. Some raytracing shader stages, like intersection shaders and any hit shaders, don’t even support tracing rays via dynamic-shader-based raytracing. But the inline form is available everywhere.
- Another combination is to switch to the inline form for simple recursive rays. This enables the app to declare there is no recursion for the underlying raytracing pipeline, given inline raytracing is handling recursive rays. The simpler dynamic scheduling burden on the system can yield better efficiency.
Scenarios with many complex shaders will run better with dynamic-shader-based raytracing, as opposed to using massive inline raytracing uber-shaders. Meanwhile, scenarios that have a minimal shading complexity and/or very few shaders will run better with inline raytracing. If the above all seems quite complicated, well, it is! The high-level takeaway is that both the new inline raytracing and the original dynamic-shader-based raytracing are valuable for different purposes. As of DXR 1.1, developers not only have the choice of either approach, but can even combine them both within a single renderer. Hybrid approaches are aided by the fact that both flavors of DXR raytracing share the same acceleration structure format, and are driven by the same underlying traversal state machine. Best of all, gamers with DX12 Ultimate hardware can be assured that no matter what kind of Raytracing solution the developer chooses to use, they will have a great experience.
Variable Rate Shading
Variable Rate Shading (VRS) allows developers to selectively vary a game’s shading rate. This lets them ‘dial up’ the GPU power in more importance parts of the game for better visuals and ‘dial back’ the GPU power in less important areas of a game for better speed. Variable Rate Shading also has the advantage of being relatively low cost to implement for developers.
Mesh Shaders
Mesh Shaders give developers more programmability than ever before. By bringing the full power of generalized GPU compute to the geometry pipeline, mesh shaders allow developers to build more detailed and dynamic worlds than ever before. Prior to mesh shader, the GPU geometry pipeline hid the parallel nature of GPU hardware execution behind a simplified programming abstraction which only gave developers access to seemingly linear shader functions. For instance, the developer writes a vertex shader function that is called once for each vertex in a model, implying serial execution. However, behind the scenes, the hardware packs adjacent vertices to fill a SIMD wave, then executes 32 or 64 vertex shader functions in parallel on a single shader core. This model has worked extremely well for many years, but it is leaving performance and flexibility on the table by hiding the details of what the hardware is really doing from developers.
Mesh shaders change this by making geometry processing behave more like compute shaders. Rather than a single function that shades one vertex or one primitive, mesh shaders operate across an entire compute thread group, with access to group shared memory and advanced compute features such as cross-lane wave intrinsics that provide even more fine grained control over actual hardware execution. All these threads work together to shade a small indexed triangle list, called a ‘meshlet’. Typically there will be a phase of the mesh shader where each thread is working on a separate vertex, then another phase where each thread works on a separate primitive – but this model is completely flexible allowing data to be shared across threads, new vertices or primitives created as needed, existing primitives clipped or culled, etc. Along with this new flexibility of thread allocation comes a flexibility of input data formats. Mesh shader no longer uses the Input Assembler block, which was previously responsible for fetching index and vertex data from memory. Instead, shader code is free to read whatever data is needed from any format it likes. This enables novel new techniques such as index buffer compression, or the use of multiple different index buffers for different channels of vertex data. Such approaches can reduce memory usage and also reduce the memory bandwidth used during rendering, thus boosting performance. Amplification shaders are especially useful for culling, as they can determine which meshlets are visible, testing each set of between 32-256 triangles against a geometric bounding volume, normal cone, or more advanced techniques such as portal visibility planes, before deciding whether to launch a mesh shader thread group for that meshlet. Previously, culling was typically performed on a coarser per-mesh level to decide whether to draw an object at all, and also on a finer per-triangle level at the end of the geometry pipeline. This new intermediate level of culling improves performance when drawing models that are only partially occluded. For instance, if part of a character is on screen while just one arm is not, an amplification shader can cull that entire arm after much less computation than it would have taken to shade all the triangles within it.
Sampler Feedback
Sampler Feedback enables better visual quality, shorter load time, and less stuttering by providing detailed information to enable developers to only load in textures when needed. Suppose you are a game developer shading a complicated 3D scene. The camera moves swiftly throughout the scene, causing some objects to be moved into different levels of detail. Since you need to aggressively optimize for memory, you bind resources to cope with the demand for different LODs. Perhaps you use a texture streaming system; perhaps it uses tiled resources to keep those gigantic 4K mip 0s non-resident if you don’t need them. Anyway, you have a shader which samples a mipped texture using A Very Complicated sampling pattern. Pick your favorite one, say anisotropic.
The sampling in this shader has you asking some questions. What mip level did it ultimately sample? Seems like a very basic question. In a world before Sampler Feedback there’s no easy way to know. You could cobble together a heuristic. You can get to thinking about the sampling pattern, and make some educated guesses. But 1) You don’t have time for that, and 2) there’s no way it’d be 100% reliable. Where exactly in the resource did it sample? More specifically, what you really need to know is— which tiles? Could be in the top left corner, or right in the middle of the texture. Your streaming system would really benefit from this so that you’d know which mips to load up next.
Sampler feedback solves this by allowing a shader to efficiently query what part of a texture would have been needed to satisfy a sampling request, without actually carrying out the sample operation. This information can then be fed back into the game’s asset streaming system, allowing it to make more intelligent, precise decisions about what data to stream in next. In conjunction with the D3D12 tiled resources feature, this allows games to render larger, more detailed textures while using less video memory. Sampler feedback also enables Texture-space shading (TSS), a rendering technique which de-couples the shading of an object in world space from the rasterization of the shape of that object to the final target.
TSS is a technique that allows game developers to do expensive lighting computations in object space, and write them to a texture— for example, something that looks like a UVW unwrapping of the object. Since nothing is being rasterized the shading can be done using compute, without the graphics pipeline at all. Then, in a separate step, bind the texture and rasterize to screen space, performing a dead simple sample. This approach reduces aliasing and allows computing lighting less often than rasterization. Decoupling these two rates allows the use of more sophisticated lighting techniques at higher framerates. One obstacle in getting TSS to work well is figuring out what in object space to shade for each object. Everything? That would be hardly efficient. What if only the left-hand side of an object is visible? With the power of sampler feedback, the rasterization step can simply record what texels are being requested and only perform the application’s expensive lighting computation on those.