Beta BIOS Enables Prioritization of CCDs on AMD 7000X3D Processors

Published by

Click here to post a comment for Beta BIOS Enables Prioritization of CCDs on AMD 7000X3D Processors on our message forum
data/avatar/default/avatar18.webp
This is quite interesting to say the least. Wonder how those algorithms will communicate with Windows OS to detect a gaming app vs non-gaming app. The benchmarks will only tell if it will work as intended.
data/avatar/default/avatar20.webp
wait I didn't follow all thise zen4 3d matter.. isn't the "3d" L3 cache shared between al modules? at least shouldn't it be splitted between modules and linked with a bus? edit I see now, only one block will get the additional cache... still don't understand if this can be accessed or not by the other core block.
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
Hmm I wasn't aware the cache only worked for a single core cluster. Seems to me the multi-CCD models will be proportionately worse as a result of that. It also explains why perhaps there's no sign of something like a 7600G3D, since the iGPU might not get access to the cache.
https://forums.guru3d.com/data/avatars/m/216/216349.jpg
schmidtbag:

Hmm I wasn't aware the cache only worked for a single core cluster. Seems to me the multi-CCD models will be proportionately worse as a result of that. It also explains why perhaps there's no sign of something like a 7600G3D, since the iGPU might not get access to the cache.
I think AMD revelead this detail right from the start, but i could be wrong. Anyway, this fact automatically makes the 7800x3d the pick of the bunch unless we really need the extra cores.
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
H83:

I think AMD revelead this detail right from the start, but i could be wrong. Anyway, this fact automatically makes the 7800x3d the pick of the bunch unless we really need the extra cores.
I pay very little attention to press material from manufacturers. Embellished graphs, cherry-picked results, stupid buzzwords, too many asterisks, etc. But yeah, the 7800X3D really seems to be the only obvious choice. The 7600X3D I'm sure will perform well too but I'm not so sure the price of the V-cache is justified on 6c/12t.
https://forums.guru3d.com/data/avatars/m/181/181063.jpg
I miss the days of the FSB....so many settings today...
https://forums.guru3d.com/data/avatars/m/271/271560.jpg
Alessio1989:

wait I didn't follow all thise zen4 3d matter.. isn't the "3d" L3 cache shared between al modules? at least shouldn't it be splitted between modules and linked with a bus? edit I see now, only one block will get the additional cache... still don't understand if this can be accessed or not by the other core block.
3dcache isn't linked by a bus or any other traditional interconnect. it relies on the physics between two shaved die chiplets - look ma no wires! but this is why it only works on the CCD it's attached to
https://forums.guru3d.com/data/avatars/m/181/181063.jpg
Well, based on rumors, leaks, etc for the sake of speculation I think that 7950X3D is a mistake (another one) from AMD. It seems that the increase in performance from 7800X3D is not much (and only in some specific games) to justify buying it....Also for workloads the vanilla 7950X is better so I really don't see the point of it as the best of both worlds like AMD is advertising it.. But, let's wait for official benchmarks... Also, these CCD settings, algorithms will further confuse users - I see what they are trying here but it is just another complication.
data/avatar/default/avatar22.webp
tunejunky:

3dcache isn't linked by a bus or any other traditional interconnect. it relies on the physics between two shaved die chiplets - look ma no wires! but this is why it only works on the CCD it's attached to
so there is no way for the other core module to access the 3d cache? this would lead to a gazillion of cache miss. cannot find any info beside the fact the 3d v-cache block is just on one core module and any info about the module bus connection changes.
data/avatar/default/avatar29.webp
Alessio1989:

so there is no way for the other core module to access the 3d cache? this would lead to a gazillion of cache miss. cannot find any info beside the fact the 3d v-cache block is just on one core module and any info about the module bus connection changes.
The L3 cache is not shared between cores across the bus - doing so would completely negate the speed boost of the cache due to the latency. This has been a limitation of the Zen architecture since day 1, and why some of the Ryzen 1000-3000 chips were faster for gaming even though they had fewer cores and lower clocks because they only had one CCD (such as the 3800 being faster than the 3900). You don't actually get cache misses as much as you would think because the OS scheduler can keep track of CCDs individually and knows what's in what cache, but it still adds enough inefficiency to the process to be noticeably slower, especially on the SMT side of things which basically use cache as a way to trick the system into thinking there's a second core on each physical one. This new BIOS update will also allow it to actively prioritize cores based on process type, though I do not understand what voodoo they are using to determine what threads benefit more from cache vs what ones benefit more from clock speed.
https://forums.guru3d.com/data/avatars/m/271/271560.jpg
Alessio1989:

so there is no way for the other core module to access the 3d cache? this would lead to a gazillion of cache miss. cannot find any info beside the fact the 3d v-cache block is just on one core module and any info about the module bus connection changes.
nope there is no (direct) module bus connection from 3d cache, only for CCD so no change whatsoever. again, this process relies on pure physics and ultra-precision machining there is a wealth of information online about this process. the 5800X3D came from leveraging Epyc processors which is where the advancements came from, incl. eliminating 5800X3D heat issues by adding a separate voltage regulator to the cache (from Epyc Genoa) which is why the 7XXX3DCache is easier to cool and has (limited) OC ability. plainly and simply put, without any cheerleading whatsoever (just the literal fact) the 7xxx3DCache and the Epyc 3d variants are the most advanced precision built devices ever made in the history of man.
data/avatar/default/avatar13.webp
We will see how well all this works in a few days.
https://forums.guru3d.com/data/avatars/m/266/266726.jpg
Neato! seems they will actually being using heuristics to do it instead of a whitelist.
Yosif Videlov:

This is quite interesting to say the least. Wonder how those algorithms will communicate with Windows OS to detect a gaming app vs non-gaming app. The benchmarks will only tell if it will work as intended.
given the options in the beta bios, it is likely they are using cache occupancy / memory pressure change their cppc preferred cores data on the fly, small change on the windows side is probably also required, its pretty simple so it should work pretty well. normally cppc preferred cores is intended for telling the os which cores are fastest/clock the highest, instead amd (already) uses it to manipulate the windows scheduler to prevent from moving threads between ccd/ccxs on multi ccx ryzen chips.( as seen on the ryzen 3000 chips) they could be using some other interface , but this is most likely imo.
https://forums.guru3d.com/data/avatars/m/225/225084.jpg
For strictly gaming i think 8 core 16 threads is plenty for now, if you need more cores for some work load then 7950 is better for you most likely. 7800x3d vs my 5800x3d is what i'll be most interested in. How big of a leap will it be. I doubt the 7900/7950x3d will beat the 7800x3d by much in games. Yes having more than one ccx and locking down the 3d cache to a single ccx will be a complex bit of software to sort all that out. Probably needs both a bios update and a Windows update to get them running 100% but we'll soon find out.
data/avatar/default/avatar38.webp
illrigger:

The L3 cache is not shared between cores across the bus - doing so would completely negate the speed boost of the cache due to the latency. This has been a limitation of the Zen architecture since day 1, and why some of the Ryzen 1000-3000 chips were faster for gaming even though they had fewer cores and lower clocks because they only had one CCD (such as the 3800 being faster than the 3900). You don't actually get cache misses as much as you would think because the OS scheduler can keep track of CCDs individually and knows what's in what cache, but it still adds enough inefficiency to the process to be noticeably slower, especially on the SMT side of things which basically use cache as a way to trick the system into thinking there's a second core on each physical one. This new BIOS update will also allow it to actively prioritize cores based on process type, though I do not understand what voodoo they are using to determine what threads benefit more from cache vs what ones benefit more from clock speed.
maybe I pointed the question in a wrong way: what if the second core block module needs to read/write a word that is currently in the 3D v-cache of the other module? There is a huge cache flush to memory (DRAM) or there is still an infinity fabric path allowing to short the load/store operations? eg: CCX0 just updated something (let's call it word0) into the 3D vcache. CCX1 needs to I/O the word0 (eg: some kind of sync primitive) what happens? word0 must be pushed back to DRAM and then pulled back from CCX1 OR CCX1 can pull the word0 from the 3D V-cache using an infinity fabric path?
data/avatar/default/avatar08.webp
tunejunky:

nope there is no (direct) module bus connection from 3d cache, only for CCD so no change whatsoever. again, this process relies on pure physics and ultra-precision machining there is a wealth of information online about this process. the 5800X3D came from leveraging Epyc processors which is where the advancements came from, incl. eliminating 5800X3D heat issues by adding a separate voltage regulator to the cache (from Epyc Genoa) which is why the 7XXX3DCache is easier to cool and has (limited) OC ability. plainly and simply put, without any cheerleading whatsoever (just the literal fact) the 7xxx3DCache and the Epyc 3d variants are the most advanced precision built devices ever made in the history of man.
you should get hired by AMD marketing XD actually there is nothing accurate online except bad ms paint edited images and speculations.
https://forums.guru3d.com/data/avatars/m/271/271560.jpg
Alessio1989:

you should get hired by AMD marketing XD actually there is nothing accurate online except bad ms paint edited images and speculations.
been there done that. not for AMD nothing I said is inaccurate in any way however. look up precision machining this is on a whole different level than Swiss engineering. and AMD doesn't get all the credit . it's shared with TSMC
https://forums.guru3d.com/data/avatars/m/266/266726.jpg
Alessio1989:

maybe I pointed the question in a wrong way: what if the second core block module needs to read/write a word that is currently in the 3D v-cache of the other module? There is a huge cache flush to memory (DRAM) or there is still an infinity fabric path allowing to short the load/store operations? eg: CCX0 just updated something (let's call it word0) into the 3D vcache. CCX1 needs to I/O the word0 (eg: some kind of sync primitive) what happens? word0 must be pushed back to DRAM and then pulled back from CCX1 OR CCX1 can pull the word0 from the 3D V-cache using an infinity fabric path?
afaik there is coherency across the infinity fabric (aka cores on 1 ccx can access the l3 on another ), but there is a considerable latency penalty, was one of the major problems with zen 1 and zen 2
data/avatar/default/avatar30.webp
user1:

afaik there is coherency across the infinity fabric (aka cores on 1 ccx can access the l3 on another ), but there is a considerable latency penalty, was one of the major problems with zen 1 and zen 2
yes and with 3d v-cache latency could be even higher.. in both cases (infinity fabric only path or dram path).. that would be interesting to see how they solve this clusterfuck.
https://forums.guru3d.com/data/avatars/m/266/266726.jpg
Alessio1989:

yes and with 3d v-cache latency could be even higher.. in both cases (infinity fabric only path or dram path).. that would be interesting to see how they solve this clusterfuck.
If there was going to be a significant problem we probably would've already seen it on the 7950x, so long as amd's algortihm is conservative it shouldnt result in any serious regressions imo.