Intel Royal Core x86 Microarchitecture: Origins, Features, and Future Plans

Published by

Click here to post a comment for Intel Royal Core x86 Microarchitecture: Origins, Features, and Future Plans on our message forum
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
I'd rather they ditch the "Lake" nomenclature. Seems like at this point they just throw a dart at a dictionary of common nouns and add "Lake" to the end of it. I liked it better when Intel's naming scheme corresponded to their tick-tock approach.
data/avatar/default/avatar20.webp
I hope these future cores have the extended 32 GPRs from APX, and an overhaul of the x86 legacy extensions.
https://forums.guru3d.com/data/avatars/m/248/248994.jpg
"Hyper-Threading with rentable units" What's next? Hyper-threading with auctioned units? The highest bidding applications will win processor time, whereas the destitute programs will be left wanting.
https://forums.guru3d.com/data/avatars/m/268/268248.jpg
Kaarme:

"Hyper-Threading with rentable units" What's next? Hyper-threading with auctioned units? The highest bidding applications will win processor time, whereas the destitute programs will be left wanting.
Please add your credit card if you want your video render to get priority , do not give em ideas kaarme!
https://forums.guru3d.com/data/avatars/m/258/258664.jpg
The sad part is, this could happen right now, too. With millisecond auctions for ad spaces on the internet, as long as your rig is connected to the internet, this could very well happen on your PC too. Or with any one of those hundreds of crypto mining apps that are still around.
https://forums.guru3d.com/data/avatars/m/56/56686.jpg
Show me CPU dont eat 150+ tdp and I might consider intel again till then AMD 9700x still look amazing to me even the 7800x3d look amazing in comparison
https://forums.guru3d.com/data/avatars/m/189/189980.jpg
Whilst ARM grows and grows and reassuring and consolidates its position in the data centers. I'll happy to learn that Intel has some secret projects to unveil, in order to fight ARM. But somehow I doubt it.
data/avatar/default/avatar04.webp
I find it weird that they remove HT to get more performance, but then add HT X4 in the future for more performance.
https://forums.guru3d.com/data/avatars/m/268/268248.jpg
TLD LARS:

I find it weird that they remove HT to get more performance, but then add HT X4 in the future for more performance.
Maybe they designed it from the ground up than iterating from the old ht that seems to be more of a security valnurability these days , maybe they deemed is not worth trying to fix it for just one gen . Or maybe is just Intel doing Intel stuff.
https://forums.guru3d.com/data/avatars/m/266/266726.jpg
TLD LARS:

I find it weird that they remove HT to get more performance, but then add HT X4 in the future for more performance.
I think it has to do with power consumption, and the presence of E cores. its possible they intend to abandon the hybrid architecture in the future.
data/avatar/default/avatar27.webp
Venix:

Maybe they designed it from the ground up than iterating from the old ht that seems to be more of a security valnurability these days , maybe they deemed is not worth trying to fix it for just one gen . Or maybe is just Intel doing Intel stuff.
user1:

I think it has to do with power consumption, and the presence of E cores. its possible they intend to abandon the hybrid architecture in the future.
How does HT work? Can the CPU see a pipe of workloads and pick whatever it likes, to fill out the small microscopic pauses in work or does it only see the first workload coming out of the pipe and judges if that fits in a gap? 4X "HT" would only make sense to me if the CPU looks at 4 pipes and is only able to choice from the 1st job from each pipe, instead of being able to chose freely from the entire CPU job schedule overview. On a second note, most Intel configurations are power or thermal limited, so they would need to fix that first to have any real benefit to a improved HT system. HT makes more heat because more work is crammed into the chip, so the chip might clock down a little, without HT the CPU does less work and is able to clock higher, in the end makes close to the same work as HT on.
https://forums.guru3d.com/data/avatars/m/268/268248.jpg
@TLD LARS supposedly on hollyday condition (waiting for another task to bring results cause they are needed to continue) would sneak in something else to calculate if there is anything available for it. Now this is as surface as it gets brach prediction and more and more variables get into play and I am not even remotely qualified to even attempt to guess how exactly it works .
https://forums.guru3d.com/data/avatars/m/34/34735.jpg
How many unreleased architectures are they currently designing at once? Bartlett Lake Arrow Lake (Royal Core Cobra Core) Beast Lake
https://forums.guru3d.com/data/avatars/m/266/266726.jpg
TLD LARS:

How does HT work? Can the CPU see a pipe of workloads and pick whatever it likes, to fill out the small microscopic pauses in work or does it only see the first workload coming out of the pipe and judges if that fits in a gap? 4X "HT" would only make sense to me if the CPU looks at 4 pipes and is only able to choice from the 1st job from each pipe, instead of being able to chose freely from the entire CPU job schedule overview. On a second note, most Intel configurations are power or thermal limited, so they would need to fix that first to have any real benefit to a improved HT system. HT makes more heat because more work is crammed into the chip, so the chip might clock down a little, without HT the CPU does less work and is able to clock higher, in the end makes close to the same work as HT on.
SMT or hyperthreading , is pretty simple, first modern cpus have long pipelines, ~20 stages , a stage performs a certain operation. the more stages you have the more work you can do in parallel at any given time , however it comes at a price, if you don't schedule operations correctly, you can be left with "bubbles" where stages are idle , and if you make a mistake like a mispredict, the pipeline takes longer to flush before new work can begin, smt works, by filling bubbles in the pipeline. having more than 1 thread feeding a core means there is greater opportunity to schedule instructions efficiently and if one thread stalls, you still have work to do , In terms of silicon cost its also much cheaper than prediction or scheduling hardware , everything is done "in flight" so to speak You can take this quite far, ibm for instance supports 8way smt on some of their POWER architecture cpus. https://upload.wikimedia.org/wikipedia/commons/thumb/c/cb/Pipeline%2C_4_stage.svg/563px-Pipeline%2C_4_stage.svg.png https://en.wikipedia.org/wiki/Superscalar_processor https://en.wikipedia.org/wiki/Simultaneous_multithreading https://en.wikipedia.org/wiki/Instruction_pipelining SPECULATION: its possible that intel is moving to a longer pipeline , 4 way smt probably makes more sense if they intend to do that, the other side effect of a longer pipeline is that you can clock the processor higher, because each stage is generally simpler. As you might have noticed this sounds similar to netburst, which is exactly what happened, and where smt originates on intel cpus, more over, Pat Gelsinger, was the CTO during that era..... o_O gets the noggin joggin...
data/avatar/default/avatar17.webp
user1:

SMT or hyperthreading , is pretty simple, first modern cpus have long pipelines, ~20 stages , a stage performs a certain operation. the more stages you have the more work you can do in parallel at any given time , however it comes at a price, if you don't schedule operations correctly, you can be left with "bubbles" where stages are idle , and if you make a mistake like a mispredict, the pipeline takes longer to flush before new work can begin, smt works, by filling bubbles in the pipeline. having more than 1 thread feeding a core means there is greater opportunity to schedule instructions efficiently and if one thread stalls, you still have work to do , In terms of silicon cost its also much cheaper than prediction or scheduling hardware , everything is done "in flight" so to speak You can take this quite far, ibm for instance supports 8way smt on some of their POWER architecture cpus. https://en.wikipedia.org/wiki/Superscalar_processor https://en.wikipedia.org/wiki/Simultaneous_multithreading https://en.wikipedia.org/wiki/Instruction_pipelining SPECULATION: its possible that intel is moving to a longer pipeline , 4 way smt probably makes more sense if they intend to do that, the other side effect of a longer pipeline is that you can clock the processor higher, because each stage is generally simpler. As you might have noticed this sounds similar to netburst, which is exactly what happened, and where smt originates on intel cpus, more over, Pat Gelsinger, was the CTO during that era..... o_O gets the noggin joggin...
I think I understand 10% of it, thanks. There must be a sweetspot where more HT threads just generates more problems with thread directing and out of order executions, then it produces more performance.
https://forums.guru3d.com/data/avatars/m/266/266726.jpg
TLD LARS:

I think I understand 10% of it, thanks. There must be a sweetspot where more HT threads just generates more problems with thread directing and out of order executions, then it produces more performance.
there are definitely diminishing returns, but generally smt just keeps the core "full" more of the time, there used to be a bigger penalty on older cpus, but these days its practically non-existent ( at least on cpus with vulnerabilities patched), so in this sense its almost "free" performance, The increased heat and power draw is probably the biggest downside.