GPU Architecture
GPU Architecture
The Nvidia GeForce RTX 4060 Ti Founders Edition will include 8GB of memory, with an option for a 16GB model. As for the GeForce RTX 4060, Nvidia has not released any comments yet. However, it's confirmed that the entry-level Ada Lovelace graphics card for gaming enthusiasts will debut at $299 USD, exclusive of VAT.
- GeForce RTX 4060 Ti 8GB: Will be available from May 24 for $399 USD
- GeForce RTX 4060 Ti 16GB: Will be released in July for $499 USD
- GeForce RTX 4060 8GB: Will also be released in June for $299 USD
The theoretical computing power of the GeForce RTX 4060 Ti is said to be 22 TFLOPS, according to Nvidia. Both the 8GB and 16GB models will have the same specs, differing only in memory capacity. This figure is 24 percent less than that of the GeForce RTX 4070, indicating a performance level potentially similar to the GeForce RTX 3070 (8 GB).
Nvidia announced a 128-bit memory interface for the GeForce RTX 4060. A significantly larger L2 cache of 24 MB offsets the lower memory bandwidth when compared to the GeForce RTX 3060 Ti. The memory operates at 17 Gbps. The total power draw of the GeForce RTX 4060, independent of memory size, is 115 watts. Like other RTX 4000 series cards, the GeForce RTX 4060 Ti includes features such as an AV1 encoder and Tensor cores that support DLSS 3, encompassing AI-generated frames.
ADA (Lovelace) in general
ALL ADA GPUs are based of the same base design. Built on a unique TSMC 4N process, provides more raster, raytracing, and AI-accelerated computation performance over the previous generation Ampere. The biggest AD102 GPU has 76.3 billion transistors and a surface area of 608.4 mm2. This indicates that the transistor density of 125.5 million per mm2 is 2.78x higher than Samsung fabbed GA102 Ampere GPU built on the 8N node. NVIDIA Ada (named after the mathematician) has something new called Shader Execution Reordering (SER), which is said to speed up raster operations and provide up to 25% improved gaming performance. Ada is also fitted with next-generation RT Cores (Gen3) and faster Tensor cores (Gen4). The latter can achieve up to 1400 TFLOPS, which is 4,375 times greater than Ampere's third-generation cores.
The ADA GPU base design
Team Green has shown the most powerful Lovelace GPU, the biggest GPU ADA102 has up to 76 billion transistors and, like Hopper, is built on TSMC's 4N node. Regular shaders, as well as the raytracing and Tensor cores, have all been improved. At its initial price of USD 1,499, the GeForce RTX 3090 was $1,000 less than the Nvidia Titan RTX. Unfortunately, we don't see this trend continuing, but the RTX 4090 will likely be priced between $1,499 and $1,999 depending on AIB designs, making it competitive with the RTX 3090 Ti, the current king of the RTX hill. We now turn our focus to the RTX 4080 and perhaps announced later RTX 4070; we had hoped that their initial retail pricing of $699 and $499, respectively, would be maintained. However, the recent increase in the cost of silicon wafers may cause a 10% increase in the MSRP of RTX 4000 GPUs. The CUDA Core (Shaders/Stream) count is going to rise on all Nvidia hardware, the RTX 4090 graphics card will contain 16,384 Shading Cores. Below is an overview of what we think are the specs; these will be updated once more and official information arrives. Nvidia's RTX 4000 Series graphics cards are built on TSMC's 4/5nm production node, promising improved performance over the RTX 3000 Series' 8nm GPUs. Nvidia can pack more transistors onto the GPU by using a more compact process node, increasing its processing speed. Since ray tracing and DLSS are still crucial technologies for GeForce graphics cards, Nvidia will undoubtedly work to improve their efficiency. Ada Lovelace's architecture denotes an update of its streaming multiprocessors. Each of them gives up to twice the performance. Nvidia is also adding a new reordering option for shader execution. It should speed up shading in the GPU pipeline by rescheduling real-time jobs to ensure they are completed as efficiently as feasible. According to Nvidia, this improves overall gaming performance by up to 25% and is two to three times faster for ray tracing.
Raytracing and Tensor Cores
The third generation of ray tracing cores is also introduced in Lovelace, increasing the throughput of ray-triangle interceptions. Then fourth generation of Tensor cores must also offer up to four times the throughput of its predecessor. Additionally, AV1 encoding will be supported by RTX 40 series GPUs. For the first time, the upscaling approach DLSS is getting a new 3.0 version that can generate its own frames for higher frame rates. DLSS 3.0 is only available on RTX 40 cards and does not work on GPUs from previous generations. NVIDIA engineers have developed three new features in the Ada RT Core to enable high-performance ray tracing of highly complex geometry:
- First, Ada’s Third-Generation RT Core features 2x Faster Ray-Triangle Intersection Throughput relative to Ampere; this enables developers to add more detail into their virtual worlds.
- Second, Ada’s RT Core has 2x Faster Alpha Traversal; the RT Core features a new Opacity Micromap Engine to directly alpha-test geometry and significantly reduce shader-based alpha computations. With this new functionality, developers can very compactly describe irregularly shaped or translucent objects, like ferns or fences, and directly and more efficiently ray trace them with the Ada RT Core.
- Third, the new Ada RT Core supports 10x Faster BVH Build in 20X Less BVH Space when using its new Displaced Micro-Mesh Engine to generate micro-triangles from micromeshes on-demand. The micro-mesh is a new primitive that represents a structured mesh of micro-triangles that the Ada RT Core processes natively, saving the storage and processing compared to what is normally required when describing complex geometries using only basic triangles.
Tensor Cores are technological high-performance compute cores designed for matrix multiply and accumulating math operations utilised in AI and HPC applications. Tensor Cores deliver unprecedented performance for matrix calculations, which are crucial for deep learning neural network training and inference functions at the edge. Ada outperforms Ampere in terms of FP16, BF16, TF32, INT8, and INT4 Tensor TFLOPS, and also incorporates the Hopper FP8 Transformer Engine, which yields over 1.3 PetaFLOPS of tensor processing in the RTX 4090. of course fos us consumers this means these will be applied for DLSS.
Video engine
Ada GPUs advance streaming and video content by adding AV1 video encoding support to the Ada eighth-generation dedicated hardware encoder (NVENC). Ampere GPUs of previous generations supported AV1 decoding but not encoding. Ada's AV1 encoder is 40% more efficient than the GeForce RTX 30 Series GPUs' H.264 encoder. AV1 will allow users who are already broadcasting at 1080p to boost their resolution to 1440p while maintaining the same bitrate and quality. For users with 1080p displays, streams will appear similar to 1440p, resulting in improved quality. Dual NVENC encoders are included on Ada GeForce RTX 40 Series GPUs with at least 12 GB of memory to improve encoding performance. This supports video encoding at 8K/60 or four 4K/60 for professional video editing. (Game streaming services can also utilise this to enable more concurrent sessions, for example.) DaVinci Resolve by Blackmagic Design, the Voukoder plugin for Adobe Premiere Pro, and Jianying, the leading video editing tool in China, all enable AV1 compatibility and a dual encoder via encode presets. In October, dual encoder and AV1 compatibility will be available for these applications. NVIDIA is also collaborating with the popular video effects application Notch to enable AV1 and with Topaz to offer support for AV1 and dual encoders. In addition to NVENC, Ada GPUs feature the fifth-generation hardware decoder, which was introduced with Ampere (known as NVDEC). NVDEC supports hardware-accelerated MPEG-2, VC-1, H.264 (AVCHD), H.265 (HEVC), VP8, VP9, and AV1 video decoding. 8K/60 decoding is also supported in full.