Rumor: AMD Epyc2 processors could get 64 cores over 8+1 dies

#5601980

djmcave

2018-10-31 11:47

With every CTX being produced individually it should keep costs down... Wondering if they intend to do the same on the GPUs.

#5601984

Fediuld

2018-10-31 12:14

sverek:

Can you glue GPUs together? Wouldn't it be crossfire?

AMD has developed Infinity Fabric for GPUs, and was announced back in August with all the 7nm GPUs supporting it. In addition to pcie4.0. (next round of ryzen boards support it) So the bandwidth is there, after all CF didn't needed a separate bridge since Hawaii in 2013.

#5602017

Evildead666

2018-10-31 13:55

sverek:

Can you glue GPUs together? Wouldn't it be crossfire?

Fediuld:

AMD has developed Infinity Fabric for GPUs, and was announced back in August with all the 7nm GPUs supporting it. In addition to pcie4.0. (next round of ryzen boards support it) So the bandwidth is there, after all CF didn't needed a separate bridge since Hawaii in 2013.

Yup, this is going to be the new normal. This way, you get the System Controller with everything you need, on a slightly older process, due ot it not needing to be going at break-neck speed, in fact it would be better because the PCIe Controllers and SATA controllers wouldn't be bothered by the CPU speeds anymore. The CPU's would be able to be done much cheaper on the latest fab processes, with much faster chips, and better yields. GPU's would be the exact same idea, just with more "cores" per "GPU-CCX". Four cores per CPU-CCX now, maybe ~2048 cores per GPU-CCX ? (about Polaris 30 type, but with Navi)

#5602049

Venix

2018-10-31 15:24

8 cores in 64 nm^2 ??? isn't that waaaaaaay too low ?

#5602059

tunejunky

2018-10-31 15:46

sverek:

Can you glue GPUs together? Wouldn't it be crossfire?

Yes you can, it is roadmapped. No, it wouldn't be crossfire, it would act as a single gpu in exactly the same way as threadripper and ryzen. this is the high-end 7nm gpu from AMD that will be announced at the end of next year. scalability is the single highest goal at AMD. there will not be any performance goals that cannot be met because of Moore's Law, which was the driving force behind the design. they can just add graphic cores. so for the first time ever in a gpu there will be "moar cores". and double gpu's do not count because those are crossfired or SLI'd.

#5602065

tunejunky

2018-10-31 15:52

Fox2232:

It would be crazy if they decoupled CPU cores from uncore this way. Because from what I have seen Cores themselves can clock higher, it is all that stuff around which does not like it on 2700X.

very true at 12nm. if they use a "hub" design for the controller a la "epyc 2" as shown, that would be a major difference. also remaining to be seen is whether or not HBM2 is going to be used and/or if the related costs have come down. there very well may be a "HBM 3" made at a smaller node as well, which would lower the cost (after production costs are earned back).

#5602068

tunejunky

2018-10-31 15:54

i was thinking re: gpu on the above comments Fox

#5602081

nevcairiel

2018-10-31 16:15

Fox2232:

It would be crazy if they decoupled CPU cores from uncore this way. Because from what I have seen Cores themselves can clock higher, it is all that stuff around which does not like it on 2700X.

Its not exactly news. Uncore and actual CPU cores have been using seperate clock domains in various CPUs everywhere.

#5602085

waltc3

2018-10-31 16:38

My first thought was wondering why the system controller wasn't also 7nm, as it surely is a much simpler circuit design than a single cpu. But according to the diagram, if they did that where would they physically position the cores relative to the controller?...Just an errant thought...

#5602089

Picolete

2018-10-31 16:50

The controller it's rumored to be 14nm because it's bigger than the other chips and the process is not mature enough for such a large chip on 7nm and decent yields

#5602115

WareTernal

2018-10-31 18:34

So they are maybe going to ditch the '2 NUMA node' setup? Better late than never 🙂

#5602122

chispy

2018-10-31 18:52

128 threads would be Epyc " Pun intended " 😀

#5602124

Aura89

2018-10-31 18:57

I kinda hope this is wrong. I was hoping that 7nm would bring a core count change to the CCXs, and have it capable of having 64 cores on 4 CCXs (AKA 16 cores per CCX) Even 14, 12 or 10 cores per CCX would be nice.

#5602238

user1

2018-11-01 03:04

Aura89:

I kinda hope this is wrong. I was hoping that 7nm would bring a core count change to the CCXs, and have it capable of having 64 cores on 4 CCXs (AKA 16 cores per CCX) Even 14, 12 or 10 cores per CCX would be nice.

thing is cost per transistor is probably higher than 14/12nm, sticking with the same core count and dropping the die size, would maximize yields and keep costs lower, at least until EUV matures

#5602376

tunejunky

2018-11-01 14:30

azraei97:

For now AMD still can't do multi chip GPU working as work like in the CPU. AMD already talk about this. Navi also unlikely to get multichip design https://www.pcgamesn.com/amd-navi-monolithic-gpu-design

lol. not accurate at this time. right now there's engineering samples in Sunnyvale. Navi isn't this, Vega isn't this,we're talking Arcturus and while it's nowhere near ready for the market, it's getting the bugs worked out (i.e. module finalization, controller ops, and memory matching). and btw...there's going to be a super sweet Ryzen/Navi SoC for laptops that will bring serious gaming down in price by several hundred dollars over the I-9/2070 (Alienware) laptops about to drop.

#5602382

tunejunky

2018-11-01 14:58

Fox2232:

I have to agree. AMD had MCM GPUs long time ago. But they are not feasible for whole market. It works for Compute. But there is not sufficient benefit outside of Compute. And Compute itself does not need MCM as it does not overcome some particular problem. If AMD had very power efficient GPUs and even biggest chip they made could not reach 300W PCIe standard, then MCM would allow saturation of performance&power per card.

power delivery and regulation are more difficult than you would normally think at 7nm, nothing insurmountable, but initial run will have a higher cost more associated with beefier boards than the 7nm process. the expense of MCM is negligible @ 7nm with the higher associated yields and lower costs. the freight of the new process is being carried by the iPhone/iPad bionic A12 processor. right now TSMC is hitting economies of scale...six months early...which is why there's already ES of Navi, early ES of Arcturus, and Ryzen 2 today in Sunnyvale.

#5602401

waltc3

2018-11-01 16:14

Picolete:

The controller it's rumored to be 14nm because it's bigger than the other chips and the process is not mature enough for such a large chip on 7nm and decent yields

Well, my point was only that the controller at 7nm would be nowhere near as large as it is at 14nm, and at that point physical bus connections to the a smaller 7nm cpu cores would be problematic, per the diagram. As far as yields, go, I'm fairly certain that each 7nm cpu core (8/16) as pictured is more complex circuit-wise than the system controller, and so if the cpus come in at a good yield @ 7nm then the system controller, being less complex, would also yield well at 7nm--maybe even better than the cpus. However, as I mentioned, using the 14nm system controller would be less expensive than using one at 7nm, and if the diagram is true to scale, then a 14nm system controller would better facilitate physical connection of the cpu cores--again, per the pictured diagram. Just some idle speculation about the physical bus layout of the cpu cores connecting to the system controller, meh...;) I'm probably all wet...!

#5602402

Evildead666

2018-11-01 16:25

One thing, is that you would probably have to use edge interposers, as the size of a full interposer would probably be too big. Then again, as per the diagrams, it would be possible to do it on an organic substrate. Not sure what the interconnect speed impact would be, even if a little (compared to an interposer.). edit : one thing I just thought of is that interposers allow very larges buses, ie 1024-4096bit wide, usually HBM, and organic substrate would allow a lot less ? like the 512bit bus of DDR type cards ?

#5602455

xrodney

2018-11-01 19:48

waltc3:

Well, my point was only that the controller at 7nm would be nowhere near as large as it is at 14nm, and at that point physical bus connections to the a smaller 7nm cpu cores would be problematic, per the diagram. As far as yields, go, I'm fairly certain that each 7nm cpu core (8/16) as pictured is more complex circuit-wise than the system controller, and so if the cpus come in at a good yield @ 7nm then the system controller, being less complex, would also yield well at 7nm--maybe even better than the cpus. However, as I mentioned, using the 14nm system controller would be less expensive than using one at 7nm, and if the diagram is true to scale, then a 14nm system controller would better facilitate physical connection of the cpu cores--again, per the pictured diagram. Just some idle speculation about the physical bus layout of the cpu cores connecting to the system controller, meh...;) I'm probably all wet...!

You mention correctly that maybe size could be too small to accommodate all external connections, however...there are more to it. Not all chip parts scale same, compute cores and cache is one of those that scale very well, however memory controller and buses are those that are known to scale badly and are also more sensitive to defects. Getting defect in compute core or its local cache is actually best case here as you can simply turn that part off and sell it as lower SKU, but if you get defect anywhere in uncore (controller) its pretty much game over for that chip, unless it hits part that is doubled (doubling certain parts like long path ways is used for increasing yields).

#5602640

Luc

2018-11-02 12:28

xrodney:

You mention correctly that maybe size could be too small to accommodate all external connections, however...there are more to it. Not all chip parts scale same, compute cores and cache is one of those that scale very well, however memory controller and buses are those that are known to scale badly and are also more sensitive to defects. Getting defect in compute core or its local cache is actually best case here as you can simply turn that part off and sell it as lower SKU, but if you get defect anywhere in uncore (controller) its pretty much game over for that chip, unless it hits part that is doubled (doubling certain parts like long path ways is used for increasing yields).

Maybe an uncore controller with doubled parts for Epyc that can be disabled (or faulty) and used for Threadripper and Ryzen to save costs, will be the solution. Anyway, the interposer is still expensive on Vega final product. They need to make it cheaper before. Maybe conecting one CCX to a controller chiplet is aesy enough and letting the more complex substrate for servers who want to pay for it, will work. But Zen2 is already done, and we won't see these 8+1 solutions until Zen3 cores and new AM5 socket in 2020... I don't know, but it's fun to speculate 😛 Edit: Intel already talked about chiplets with diferent litographies bound together.