Nvidia Talks About Higher OC clocks on the Founder 2080 cards - also PCB Photo

Published by

Click here to post a comment for Nvidia Talks About Higher OC clocks on the Founder 2080 cards - also PCB Photo on our message forum
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
Turing has double the L1/L2 cache as Pascal which is going to alleviate hits to memory along with Variable Shading and Texture-Space Shading, both of which should make the overall process more efficient for memory bandwidth. And then yeah, whatever changes they made to delta compression.
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
sdamaged99:

If it is indeed faster though, why not show some benchmarks? It seems like this has been a deliberate move, which doesn't bode well
Because I don't think it's that much faster. The only comparison they showed in regular workloads was with DLAA vs TAA in the infiltrator demo - where the framerate was doubled. You'll probably get some other titles that utilize DLAA, from what I understand it isn't that hard to implement, but aside from that I only expect a 25-30% performance uplift over 1080Ti with default clocks. Perhaps with overclocking - 2ghz, etc it will start to shine but idk. Like I said in the other thread, this entire series is going to be value-add features. The NGX stuff looks like an avenue for Nvidia to add a bunch of features to the card over its lifespan.
https://forums.guru3d.com/data/avatars/m/258/258664.jpg
Denial:

Because I don't think it's that much faster. The only comparison they showed in regular workloads was with DLAA vs TAA in the infiltrator demo - where the framerate was doubled. You'll probably get some other titles that utilize DLAA, from what I understand it isn't that hard to implement, but aside from that I only expect a 25-30% performance uplift over 1080Ti with default clocks. Perhaps with overclocking - 2ghz, etc it will start to shine but idk. Like I said in the other thread, this entire series is going to be value-add features. The NGX stuff looks like an avenue for Nvidia to add a bunch of features to the card over its lifespan.
I agree. I think we will only see what Turing really is worth for us if you look at it's performance compared to Pascal in DX11 (which won't do miracles I'm afraid). DLAA / DLSS is just a method of reducing workload in the GPU by using an approximative algorythum instead of brute calculations, hence that's how they got such a boost in that scenario. They compared it to TAA though, which because of the performance hit isn't the usual "gamer's choice". I too think we need to see overclocking performance too, since that's what "we" games will run, and that's where the real worth of an upgrade will be determined, not if you need RTX / DXR or not.
data/avatar/default/avatar15.webp
metagamer:

Yes, you're right. You never know, Turing might just have some wizardry up it's sleeve too. But just looking at numbers and determining performance solely like that is silly so I don't know why people do it.
Pure Bandwidth alone should be a hint at performance. GTX 2070/1070Ti= +75% 2070/1080= +40% 2070/1080 (14Gbps) = +27% 2080Ti/1080Ti = +27% We know all too well that Nvidia does not throw in additional bw unless it's needed (omg only 192 bit; remember?). It hurts power, it ads complexity and it's wasted. So I am not worried one bit about the mandatory performance uplift compared to Pascal.
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
fantaskarsef:

I agree. I think we will only see what Turing really is worth for us if you look at it's performance compared to Pascal in DX11 (which won't do miracles I'm afraid). DLAA / DLSS is just a method of reducing workload in the GPU by using an approximative algorythum instead of brute calculations, hence that's how they got such a boost in that scenario. They compared it to TAA though, which because of the performance hit isn't the usual "gamer's choice". I too think we need to see overclocking performance too, since that's what "we" games will run, and that's where the real worth of an upgrade will be determined, not if you need RTX / DXR or not.
There are other things in the architecture that can be leveraged for more performance. For example Vega shipped with RPM for FP16 calcs, which AFAIK, was only utilized in one game (Farcry 5) but Nvidia has a similar function now (they actually had it with GP100's SM but not in consumer Pascal).. so hopefully more games will utilize that now that both vendors support it.
Fox2232:

But take it to your real world. If There is TAA with huge time per frame and then then DLAA with just 1/2 frame time (doubling fps). What's performance without AA? I would easily disable AA entirely in FPS game if it meant more than double fps. And I would likely go for downsampling, as higher resolution means more details in distance and higher per pixel precision for textures. + I do not like TAA much anyway. It blurs things bit more than I am comfortable with in almost every game. Only real benefit is complete removal of shimmering.
Well didn't the RTX Quadro slides say 1.5x from the architecture itself? I can't find the slide now.. I know someone posted it here showing ATAA vs whatever for UE but I'm pretty sure it showed a base 1.5x increase in performance with AA disabled entirely.
Noisiv:

Pure Bandwidth alone should be a hint at performance. GTX 2070/1070Ti= +75% 2070/1080= +40% 2070/1080 (14Gbps) = +27% 2080Ti/1080Ti = +27% We know all too well that Nvidia does not throw in additional bw unless it's needed (omg only 192 bit; remember?). It hurts power, it ads complexity and it's wasted. So I am not worried one bit about the mandatory performance uplift compared to Pascal.
Well this release may be different due to the RTX stuff though, which AFAIK is extremely bandwidth dependent. For example this slide: https://pbs.twimg.com/media/DkqLJVjUcAAjJOj.jpg Titan V's numbers are 9.1 / 9.7 / 18.8 - Turing's theoretical speed should be faster than what it is here but according to Morgan McGuire (Engineer for Nvidia) some shaders see less of a speedup from RT cores compared to Volta due to HBM vs GDDR6. So it seems like Raytracing is somewhat bottlenecked by memory bandwidth moreso than traditional workloads. Also I found the other slide I was talking about: https://pbs.twimg.com/media/DkqK-93UYAEGq8B.jpg:large So out of the box Nvidia is claiming the RTX 6000 is 1.5x faster than a Titan V in raster workloads. So you figure 2080Ti is slightly cut down but will have faster clocks. I guess 50% is what we should expect for regular workloads. Idk, I expect less but WE'LL SEE.
https://forums.guru3d.com/data/avatars/m/202/202673.jpg
fantaskarsef:

DLAA / DLSS is just a method of reducing workload in the GPU by using an approximative algorythum instead of brute calculations, hence that's how they got such a boost in that scenario.
It's not even reducing the workload, it's merely shifting the workload to the Tensor cores, as far as I can tell, and it could actually be four times as intensive as TAA for all we know, except that it frees up CUDA core time for rendering. So that should work well, unless you're also trying to run ray-tracing if the performance hit in Tomb Raider is as good as it gets. I have no idea how many G-rays, Tensor cores and RT engines, let alone JHH's f@cking axis-fluid RT units you need for decent RTX...I'm just a consumer, I'll wait for some reviews.
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
Texter:

It's not even reducing the workload, it's merely shifting the workload to the Tensor cores, as far as I can tell, and it could actually be four times as intensive as TAA for all we know, except that it frees up CUDA core time for rendering. So that should work well, unless you're also trying to run ray-tracing if the performance hit in Tomb Raider is as good as it gets. I have no idea how many G-rays, Tensor cores and RT engines, let alone JHH's f@cking axis-fluid RT units you need for decent RTX...I'm just a consumer, I'll wait for some reviews.
I could be wrong about this and I know people keep posting block diagrams but I'm 85% sure that Tensor/RT cores are not discreet cores. Basically the SM's get partitioned and those portions "become" Tensor/RT cores. So I don't think you actually free anything up when running stuff on Tensor - on the flipside you also don't lose anything when not running Tensor. I think this for two reasons - 1. The RT/Tensor changes are linked to the INT/FP separation in the SM itself + ALU changes and 2. There is no way they managed to get 4300 CUDA cores in a 250w TDP but then also turn on some magical other cores and keep the TDP similar. 85% sure but yeah, could be wrong.
Fox2232:

It is understandable. When you run shader code on part of texture, you do load data for all that stuff on few blocks. But raytracing with 6 samples per pixel pixel can use data from 6 completely different places in scene and if it goes for multiple bounces... Basically I think it would be more latency sensitive than bandwidth if data store was optimized for raytracing delivery. But can you really deliver to GPU cache just information that one pixel needs from each ray, or are you pulling bigger data blocks from memory?
You're right - could be latency. I just assumed bandwidth but HBM does have a massive latency advantage as well. He didn't clarify in his tweet, just said the difference was HBM vs GDDR6.
https://forums.guru3d.com/data/avatars/m/231/231931.jpg
metagamer:

You said you won't be buying one without seeing a proper gaming bench. I said you'd be pretty much brainless if you did. Basically saying that you're not brainless because you're not preordering one. Now... what is it with your attitude?
How about you tone it down some, you come across as an asshat in all your posts.
Denial:

Also I found the other slide I was talking about: https://pbs.twimg.com/media/DkqK-93UYAEGq8B.jpg:large So out of the box Nvidia is claiming the RTX 6000 is 1.5x faster than a Titan V in raster workloads. So you figure 2080Ti is slightly cut down but will have faster clocks. I guess 50% is what we should expect for regular workloads. Idk, I expect less but WE'LL SEE.
I expect the difference in normal games to be derived from the single precision difference. 11.4TFLOPS vs 16 TFLOPS. If the 2080Ti clocks high like pascal, you can expect >20tflops with OC
https://forums.guru3d.com/data/avatars/m/196/196426.jpg
Denial:

There is no way they managed to get 4300 CUDA cores in a 250w TDP but then also turn on some magical other cores and keep the TDP similar.
Fixed function binary electronics require very few transistors to implement, compared to general purpose and logic running electronics, which need to adapt to whatever code is being pushed through them. Those "tensor cores" are array addition+multiplication circuits, basically running the same operation over and over again ad-infinitum. They are, in a sense, very similar to the days of old when first "hardware accelerated graphics" were implemented, in which the 3D chip was basically just doing lots of identical calculations very fast (for that time), leaving the CPU to push the stream of numbers into them and interpret the results. If you want a bit of brain explosion, look at this: http://www.felixcloutier.com/x86/FMUL:FMULP:FIMUL.html - That's not exactly what they are doing, but pretty close. The CUDA cores on the other hand are quite close to a CPU's floating point unit, running all kind of operations, addition, multiplication, inverse (1/x), square root, trigonometry, and of course... memory access, decision (IF, CASE), jump... and so on and so on not getting too much into detail here. It is also the reason why bitcoin has moved from CPU's to GPU's to specialized ASICs, as simpler electronics can do the same few operations MUCH faster than more complex electronics which need to adapt to the incoming code. Nvidia could probably increase number of "Tensor Cores" 10 times with only adding 10% to the total transistor budget, but not very useful if the other parts of the chip can't feed those cores. It's all about balance (which makes this advanced micro-engineering so hard)
https://forums.guru3d.com/data/avatars/m/227/227994.jpg
First the 10 series price bumping and now this... I liked Nvidia you know.
data/avatar/default/avatar18.webp
wavetrex:

Fixed function binary electronics require very few transistors to implement, compared to general purpose and logic running electronics, which need to adapt to whatever code is being pushed through them. Those "tensor cores" are array addition+multiplication circuits, basically running the same operation over and over again ad-infinitum. They are, in a sense, very similar to the days of old when first "hardware accelerated graphics" were implemented, in which the 3D chip was basically just doing lots of identical calculations very fast (for that time), leaving the CPU to push the stream of numbers into them and interpret the results. If you want a bit of brain explosion, look at this: http://www.felixcloutier.com/x86/FMUL:FMULP:FIMUL.html - That's not exactly what they are doing, but pretty close. The CUDA cores on the other hand are quite close to a CPU's floating point unit, running all kind of operations, addition, multiplication, inverse (1/x), square root, trigonometry, and of course... memory access, decision (IF, CASE), jump... and so on and so on not getting too much into detail here. It is also the reason why bitcoin has moved from CPU's to GPU's to specialized ASICs, as simpler electronics can do the same few operations MUCH faster than more complex electronics which need to adapt to the incoming code. Nvidia could probably increase number of "Tensor Cores" 10 times with only adding 10% to the total transistor budget, but not very useful if the other parts of the chip can't feed those cores. It's all about balance (which makes this advanced micro-engineering so hard)
Good post. All true. Tensor cores are still not free. Especially in the terms of bandwidth. And GPU area did explode with Volta and especially with Turing(low FP64) compared to Pascal. So tensor cores functionality does not seem to be THAT free.
https://forums.guru3d.com/data/avatars/m/273/273822.jpg
Agent-A01:

How about you tone it down some, you come across as an asshat in all your posts.
Hey, in that particular post I was actually complimenting the guy. Chill.
https://forums.guru3d.com/data/avatars/m/231/231931.jpg
metagamer:

Hey, in that particular post I was actually complimenting the guy. Chill.
You chill. I'm referring to your entire demeanor not one single, specific post.
https://forums.guru3d.com/data/avatars/m/273/273822.jpg
Agent-A01:

You chill. I'm referring to your entire demeanor not one single, specific post.
hey, you were quoting one single post so excuse my confusion. Have a good day young man.
https://forums.guru3d.com/data/avatars/m/273/273822.jpg
Noisiv:

Good post. All true. Tensor cores are still not free. Especially in the terms of bandwidth. And GPU area did explode with Volta and especially with Turing(low FP64) compared to Pascal. So tensor cores functionality does not seem to be THAT free.
Definitely not free but Nvidia clearly "gambled" and went with using them for a specific purpose. Genius or not? Time will tell. The amount of CUDA cores on the 2080ti still is impressive though, that thing will be a beast.
https://forums.guru3d.com/data/avatars/m/56/56686.jpg
wo wait, refernece card that isnt blower style??
TheDeeGee:

First the 10 series price bumping and now this... I liked Nvidia you know.
Im sure they will use say that is do to the tax on things coming from china now. actual I think it more then that cause prices went up way more then 50$ more like 100$ easy
data/avatar/default/avatar24.webp
Fox2232:

Did you buy your terascale RX570 form China for $100? No wonder you write what you did. You should have bought 14nm Polaris based card not 28nm Tonga. Well done trash post. Your Athlon X4 CPU combined w/ GTX1050Ti speaks miles for your understanding.
my $40 860k hit 4.7ghz on a 3dmark verified it is basically a HT dual core and runs 4.5 stable it more cpu then 1050ti is evga ssc can push? even switching to a true ht quad core i7-2600 didnt not improve scores and my system was a lot smother with new feature like ms nvme usb c 3.1 2400 ddr3 even rx570 is trash with 2600 in titles i play. i also have like 6 apu. and my 570 isnt much of a improvemnt vs a 1050ti or the apu. the heat and power lmao.. buggy wattman grey screening not even applying stable voltages. next i will try it with coffee lake 8400 and probably another am4 apu. what ill get will probably be better frame times and only like 5-10 min fps. And yes the rx was rebrand lol rx 570 4gb polaris is nothing but a die shrink r9 and some newer api gimicks https://www.amd.com/en-us/products/graphics/desktop/r9 both just refreshed r5.garbage vega 56 should of been the 14nm 570 but i got scammed by amd marketing again. now this 2080 is out and some 1080ti gonna hit that used market. And AMD has no answer to their entire line up being trash compared to legacy 2nd hand used china products nor can they release a proper mid range card that would be worth buying vs the lowest end 1050ti toaster with cuda cores and physx..