Intel LGA 7529 Processors are Nearly 10cm in Length

Published by

Click here to post a comment for Intel LGA 7529 Processors are Nearly 10cm in Length on our message forum
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
I feel like there must be an upper limit to how many pins are actually needed. You need a finite amount to handle the PCIe lanes and memory channels; there's not really a point in having more than 128 PCIe lanes (unless you're doing something like AMD where you're linking the CPUs via PCIe) and there gets to be a point where you just simply can't fit all the traces on a motherboard for more memory channels. With how large these packages are, it makes sense to integrate more onto the SoC, thereby reducing the need of more pins leading elsewhere to the motherboard. So unless I'm missing something here, that just leaves pins for power delivery. If 7529 becomes that huge just so all the cores can be fed more power, that's a rather bleak future. Perhaps Intel is intending to compete with AMD by having one gargantuan socket rather than multi-socket designs, which overall makes sense.
https://forums.guru3d.com/data/avatars/m/248/248994.jpg
schmidtbag:

I feel like there must be an upper limit to how many pins are actually needed.
Considering the large CPUs are MCM based, the chips themselves already are sitting on a substrate. It seems to me the substrate wouldn't be that much thicker to handle the power distribution from far fewer outside points. At the end of the day, though, that would only make the outside of the CPU and the socket on the mobo more safe from external damage, as with the current tech, internally the CPU chiplets would still need that many points of power delivery via the substrate. Still, it would be nicer for anyone assembling a system.
https://forums.guru3d.com/data/avatars/m/266/266726.jpg
schmidtbag:

I feel like there must be an upper limit to how many pins are actually needed. You need a finite amount to handle the PCIe lanes and memory channels; there's not really a point in having more than 128 PCIe lanes (unless you're doing something like AMD where you're linking the CPUs via PCIe) and there gets to be a point where you just simply can't fit all the traces on a motherboard for more memory channels. With how large these packages are, it makes sense to integrate more onto the SoC, thereby reducing the need of more pins leading elsewhere to the motherboard. So unless I'm missing something here, that just leaves pins for power delivery. If 7529 becomes that huge just so all the cores can be fed more power, that's a rather bleak future. Perhaps Intel is intending to compete with AMD by having one gargantuan socket rather than multi-socket designs, which overall makes sense.
there are chips that are almost an entire wafer in size, and 128lanes of pcie is really no where near the maximum useful amount, bigger cpus mean more cores per rack , more cores per data center, bigger packages are the logical thing to do when die shrinks do not offer cost reductions and density increases you need to meet demands. you can do more pcb layers, so really there isnt any limit for that so far as you have the cash to pay for it. it is completely possible to have packages tbe size of a dinner plate or larger, it just a matter of cost and application.
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
user1:

there are chips that are almost an entire wafer in size, and 128lanes of pcie is really no where near the maximum useful amount, bigger cpus mean more cores per rack , more cores per data center, bigger packages are the logical thing to do when die shrinks do offer cost reductions and density increases you need to meet demands. you can do more pcb layers, so really there isnt any limit for that so far as you have the cash to pay for it. it is completely possible to have packages tbe size of a dinner plate or larger, it just a matter of cost and application.
Do you have an example of one of such chips? I can't imagine how that's done beyond some very niche applications, and likely using some rather large node. Intel I think still uses UPI links to do inter-socket communication rather than PCIe, which means they have a much lower dependency on PCIe lanes than AMD. Regardless, most server motherboards from what I've seen barely have enough room for 64 lanes of PCIe expansion cards. Granted, a lot of such boards use a lot of their available lanes for things like integrated networking, but my point is: I haven't got the impression there's a demand for more lanes, but rather, faster lanes. So, I still don't see having more than 128 lanes being necessary any time soon, if ever. There's really only 2 outcomes when it comes to scaling up a processor to such extreme levels: A. The SoC is so powerful/capable that it reduces how much bandwidth it needs. B. The demand is so immense that it completely dwarfs the SoC's potential, where it's probably be more cost effective to buy more servers of less potential. It seems to me there's an upper limit to how much you can cram on a motherboard until it doesn't make economic sense. Since chiplets are basically just more tightly integrated multi-socket designs, I don't really see the purpose in having one gargantuan socket the size of an ITX motherboard with a PCB so thick you would need different chassis standoffs to mount it. By having multiple sockets, you reduce the cost of the package and you can cram more features on the motherboard since you don't have thousands of traces going to one spot.
https://forums.guru3d.com/data/avatars/m/266/266726.jpg
schmidtbag:

Do you have an example of one of such chips? I can't imagine how that's done beyond some very niche applications, and likely using some rather large node. Intel I think still uses UPI links to do inter-socket communication rather than PCIe, which means they have a much lower dependency on PCIe lanes than AMD. Regardless, most server motherboards from what I've seen barely have enough room for 64 lanes of PCIe expansion cards. Granted, a lot of such boards use a lot of their available lanes for things like integrated networking, but my point is: I haven't got the impression there's a demand for more lanes, but rather, faster lanes. So, I still don't see having more than 128 lanes being necessary any time soon, if ever. There's really only 2 outcomes when it comes to scaling up a processor to such extreme levels: A. The SoC is so powerful/capable that it reduces how much bandwidth it needs. B. The demand is so immense that it completely dwarfs the SoC's potential, where it's probably be more cost effective to buy more servers of less potential. It seems to me there's an upper limit to how much you can cram on a motherboard until it doesn't make economic sense. Since chiplets are basically just more tightly integrated multi-socket designs, I don't really see the purpose in having one gargantuan socket the size of an ITX motherboard with a PCB so thick you would need different chassis standoffs to mount it. By having multiple sockets, you reduce the cost of the package and you can cram more features on the motherboard since you don't have thousands of traces going to one spot.
here is the chip https://www.cerebras.net/blog/wafer-scale-processors-the-time-has-come/,(((wafer-scale))) there is always room for more i/o, and more cores = more i/o. if amd could double the core count per package , they could double the io easily, you don't see much more than 16 layer pcbs on consumer hardware, but you can go beyond 24 layers, There is a long way to go before such things become impractical. If you're building a supremely large cluster or run a data center, more per rack means more throughput, at the current time there is practically infinite demand for computing resources, $30K racks aren't rare when a single xeon can set you back $10-20k, even something like a 24+ layer pcb is going to be a minor cost compared to the rest of the components, if you just think about a single compute node , which may comprise of 2 genoa cpus, and 12 gpus, the total silicon on that is huge, if you can wrap as much as you can into fewer larger packages, that saves you space, circuitry, and cooling complexity. its the obvious solution. and You should see some of the truly enormous mainframes from the past. check this package out [youtube=xQ3oJlt4GrI], computers were really big before stuff got small. and now due to technical limitations of shrinks, we're going big again. we've been kind of spoiled by node shrinks delivering performance uplift. Industrial/commercial/government applications have very different requirements compared comsumer. and those giant xeons aren't for consumers, market conditions can support(and have supported) much larger packages.
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
user1:

here is the chip https://www.cerebras.net/blog/wafer-scale-processors-the-time-has-come/,(((wafer-scale)))
Cool - pretty interesting.
there is always room for more i/o, and more cores = more i/o. if amd could double the core count per package , they could double the io easily, you don't see much more than 16 layer pcbs on consumer hardware, but you can go beyond 24 layers, There is a long way to go before such things become impractical. If you're building a supremely large cluster or run a data center, more per rack means more throughput, at the current time there is practically infinite demand for computing resources, $30K racks aren't rare when a single xeon can set you back $10-20k, even something like a 24+ layer pcb is going to be a minor cost compared to the rest of the components, if you just think about a single compute node , which may comprise of 2 genoa cpus, and 12 gpus, the total silicon on that is huge, if you can wrap as much as you can into fewer larger packages, that saves you space, circuitry, and cooling complexity. its the obvious solution. and You should see some of the truly enormous mainframes from the past.
Yes, I get that there is almost always a way to just keep scaling up, but the underlying point is that if this were a viable strategy, we'd have seen it done much sooner. As complexity goes up, so does the chances of failure. Reliability is critical to such systems, so sometimes to ensure reliability, the price multiplies. Companies care about maximizing profits. So, you have to find the right balance between processing density, reliability, and price. It seems to me Intel hasn't really found an effective way to do multi-socket configurations; this article on Phoronix does a good job showing how adding another socket does a lot of damage to total performance: https://www.phoronix.com/review/intel-scalability-optimizations So perhaps their line of thinking is to just make a gargantuan socket, since they're under pressure to compete with AMD's core count and they don't have the time to figure out how to make inter-socket communication more efficient.
computers were really big before stuff got small. and now due to technical limitations of shrinks, we're going big again. we've been kind of spoiled by node shrinks delivering performance uplift.
I think it's been a long while since a node shrink by itself had any noteworthy impact on delivering more per-transistor performance. It seems to me since around 22nm, the primary benefit of smaller nodes was fitting more transistors per wafer and better efficiency. That means you can either cram more transistors in the same square area, or, you can keep the transistor count the same and create more usable product per wafer. Even though we're only seeing maybe 2nm of shrinkage (which is nothing compared to 10 years ago), the transistors are already so small that 2nm is a relatively huge difference in size. The chiplet design is somewhat analogous to die shrinks, because it allows you to fit more usable product per wafer. If a chiplet die has a defect, it's no big deal compared to a giant monolithic die. Since wafers are circular, you can fit more small square dies than one large one. Since chiplets imply they're meant to be paired up with others, that allows engineers to scale up. So I guess what I'm getting at here: we're getting bigger not because of node limitations, but because it's now affordable to do so.
Industrial/commercial/government applications have very different requirements compared comsumer. and those giant xeons aren't for consumers, market conditions can support(and have supported) much larger packages.
Understood, but like I said: those giant Xeons are still too small for what many of the potential customers need. I am feeling pretty confident they will be more expensive than a multi-socket motherboard. So, what I don't get is how having 4x 128-core CPUs makes less sense than a more expensive singular 512 core CPU. The amount of cores per rack would be the same, but one system is cheaper to manufacture than the other.
https://forums.guru3d.com/data/avatars/m/266/266726.jpg
schmidtbag:

Cool - pretty interesting. Yes, I get that there is almost always a way to just keep scaling up, but the underlying point is that if this were a viable strategy, we'd have seen it done much sooner. As complexity goes up, so does the chances of failure. Reliability is critical to such systems, so sometimes to ensure reliability, the price multiplies. Companies care about maximizing profits. So, you have to find the right balance between processing density, reliability, and price. It seems to me Intel hasn't really found an effective way to do multi-socket configurations; this article on Phoronix does a good job showing how adding another socket does a lot of damage to total performance: https://www.phoronix.com/review/intel-scalability-optimizations So perhaps their line of thinking is to just make a gargantuan socket, since they're under pressure to compete with AMD's core count and they don't have the time to figure out how to make inter-socket communication more efficient. I think it's been a long while since a node shrink by itself had any noteworthy impact on delivering more per-transistor performance. It seems to me since around 22nm, the primary benefit of smaller nodes was fitting more transistors per wafer and better efficiency. That means you can either cram more transistors in the same square area, or, you can keep the transistor count the same and create more usable product per wafer. Even though we're only seeing maybe 2nm of shrinkage (which is nothing compared to 10 years ago), the transistors are already so small that 2nm is a relatively huge difference in size. The chiplet design is somewhat analogous to die shrinks, because it allows you to fit more usable product per wafer. If a chiplet die has a defect, it's no big deal compared to a giant monolithic die. Since wafers are circular, you can fit more small square dies than one large one. Since chiplets imply they're meant to be paired up with others, that allows engineers to scale up. So I guess what I'm getting at here: we're getting bigger not because of node limitations, but because it's now affordable to do so. Understood, but like I said: those giant Xeons are still too small for what many of the potential customers need. I am feeling pretty confident they will be more expensive than a multi-socket motherboard. So, what I don't get is how having 4x 128-core CPUs makes less sense than a more expensive singular 512 core CPU. The amount of cores per rack would be the same, but one system is cheaper to manufacture than the other.
generally i would consider a 2x density increase, a shrink delivered performance uplift, because it lets you double the cores , for "free", we havent relied on shrinks for frequency increases for a very long tine. modern nodes still provide good density increases, however that will probably start to change, and in intel's case their foundry has fallen behind , so going bigger is the path forward for them . if you're not aware multisocket boards are bigger and more expensive than single socket boards , they are more complex. having fewer larger packages reduces complexity , it does not increase it. in the case of intel, 4 tiles mounted on a single package is definitely more reliable than having 4 sockets which have pins , mounting mechanism ect , each socket also needs its own vrm, ect. if you think about how much empty space there is on the average board, and instead fill that with package, you can start to see how a larger package can reduce the overall foot print as compared to multi socket. the main down side to bigger packages is that you have to be able to sell them, smaller packages are easier to sell, but if intel has enough customers that will buy giant packages , then there is no problem. that said i would not be surprised at all if these giants will be available in multisocket socket board configurations. simply because intel has to do something to compete with amd , if you think about a product like bergamo, 128 cores per package, intel cant fit 120+ cores on lga 4677, so they are basically left in the dust, so you could say the gigantic package is really a coping mechanism. though I will mention that this growing socket trend began a long time ago now, with knights landing and skylake-sp. as to why we havent see this already, the answer is that we have!(see ibm package), they just havent made sense in recent years because of shrinks, why design a super deluxe custom system, when you can buy 1000s of pentiums for a lower cost!, but like i said that is coming to an end. fitting more compute per rack has become a limiting factor. some of the largest buildings in the world are data centers, you have limited options, build more building or fit more per building ect. building more building can be cost prohibitive depending on location, so it makes sense to reduce foot print as much as is reasonable, larger packages are one such way to improve space efficiency. you also mentioned performance improvements, and its true but i think its not as important as you think, most of the performance penalty can be avoided through optimization, if a software runs poorly on multi socket, its something that can almost always be resolved through patches.
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
user1:

generally i would consider a 2x density increase, a shrink delivered performance uplift, because it lets you double the cores , for "free", we havent relied on shrinks for frequency increases for a very long tine.
If you have an identical architecture on a smaller node, there will be a performance difference for the smaller node. But, those differences become almost insignificant as you get to the sizes we're at now, which was the point I was trying to make. Otherwise, I agree with you.
if you're not aware multisocket boards are bigger and more expensive than single socket boards , they are more complex. having fewer larger packages reduces complexity , it does not increase it. in the case of intel, 4 tiles mounted on a single package is definitely more reliable than having 4 sockets which have pins , mounting mechanism ect , each socket also needs its own vrm, ect. if you think about how much empty space there is on the average board, and instead fill that with package, you can start to see how a larger package can reduce the overall foot print as compared to multi socket.
You're not wrong but you're not taking the whole picture into consideration: If you're taking a large single socket such as SP5 and adding 2-3 more of them on the same board, of course that is going to dramatically increase complexity. But, if you instead take the potential of 4 separate sockets and cram it all into 1, you're not reducing complexity all that much, you're just constraining it to a smaller space. That's where things like traces become an issue, and therefore add to the cost of the motherboard.
that said i would not be surprised at all if these giants will be available in multisocket socket board configurations. simply because intel has to do something to compete with amd , if you think about a product like bergamo, 128 cores per package, intel cant fit 120+ cores on lga 4677, so they are basically left in the dust, so you could say the gigantic package is really a coping mechanism. though I will mention that this growing socket trend began a long time ago now, with knights landing and skylake-sp.
The socket already takes up more surface area than an ITX motherboard. While to your point, rack-mounted servers tend to have much larger motherboards, they're already very constrained for space. You'll basically be limited to either one socket and a few PCIe-based expansion slots, or you have 2 sockets. I don't see how it's possible to fit more. Remember: you need room to fit all those memory channels, VRMs, and chipsets too.
as to why we havent see this already, the answer is that we have!(see ibm package), they just havent made sense in recent years because of shrinks, why design a super deluxe custom system, when you can buy 1000s of pentiums for a lower cost!, but like i said that is coming to an end. fitting more compute per rack has become a limiting factor. some of the largest buildings in the world are data centers, you have limited options, build more building or fit more per building ect. building more building can be cost prohibitive depending on location, so it makes sense to reduce foot print as much as is reasonable, larger packages are one such way to improve space efficiency.
Fair enough, I guess I should have said we haven't seen it widely adopted. IBM has always made gargantuan chips. And yeah, I agree that cramming more on a single CPU package is the way forward, but the tricky part becomes feeding that many cores with data, or power. Add too many cores and the performance will suffer. You can compensate by adding HBM or hefty caches, but then you're dramatically increasing costs. Like I said, it's all a fine balance. It's hard to get it just right.
you also mentioned performance improvements, and its true but i think its not as important as you think, most of the performance penalty can be avoided through optimization, if a software runs poorly on multi socket, its something that can almost always be resolved through patches.
I wish. I'm always whining about poorly optimized software. I appreciate what Intel has done with Clear Linux though - shows how much potential there really is, and it seems others have taken notice.
https://forums.guru3d.com/data/avatars/m/266/266726.jpg
schmidtbag:

You're not wrong but you're not taking the whole picture into consideration: If you're taking a large single socket such as SP5 and adding 2-3 more of them on the same board, of course that is going to dramatically increase complexity. But, if you instead take the potential of 4 separate sockets and cram it all into 1, you're not reducing complexity all that much, you're just constraining it to a smaller space. That's where things like traces become an issue, and therefore add to the cost of the motherboard.
the package might gain some complexity, however , you must keep in mind the actual size of the package vs the board, the traces/ connections are alot shorter(better), and mechanically there is less going on. and the motherboards themselves are less complicated, though pcb cost isn't much of a concern for a $10k + server anyway, its mainly space efficiency that matters here.
schmidtbag:

The socket already takes up more surface area than an ITX motherboard. While to your point, rack-mounted servers tend to have much larger motherboards, they're already very constrained for space. You'll basically be limited to either one socket and a few PCIe-based expansion slots, or you have 2 sockets. I don't see how it's possible to fit more. Remember: you need room to fit all those memory channels, VRMs, and chipsets too.
if intel can replace quad socket boards or the mythical octo socket configs that intel supposedly supports, its probably worth it. [SPOILER="bonus"] Bonus graphic: 8s 3rd gen scalable processor (on lga 4189) block diagram https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fwww.cnews.cz%2Fwp-content%2Fuploads%2F2020%2F06%2FIntel-Xeony-Scalable-t%25C5%2599et%25C3%25AD-generace-Cooper-Lake-sch%25C3%25A9ma-8S-serveru.png&f=1&nofb=1&ipt=bc8fdeefd764ba71ca86ab5b943b26dd4b5ffe804ab260b0d0a9029b49ac9c16&ipo=images [/SPOILER]