Between events at AMD Next Horizon event in San Francisco, I wanted to do a quick recap of the new AMD EPYC 2 “Rome” platform. There were a number of disclosures today, and we can now have a discussion on what the next-generation performance picture will look like. You can see our AMD EPYC 2 Rome at the Next Horizon Event coverage to learn more. AMD EPYC CPU
The star of the show was the new AMD EPYC 2, here it is live with a triumphant Dr. Lisa Su, CEO of AMD:
AMD EPYC 2 Rome Details
Here is the quick summary of what we learned today about the AMD EPYC 2 “Rome” generation:
- Up to eight 7nm x86 compute chiplets per socket.
- Each x86 chiplet up to 8 cores
- 64 cores confirmed AMD EPYC Rome Details Trickle Out 64 Cores 128 Threads Per Socket
- There is a 14nm I/O chip in the middle of each package
- This I/O chip will handle DDR4, Infinity Fabric, PCIe and other I/O
- PCIe Gen4 support providing twice the bandwidth of PCIe Gen3
- Greatly improved Infinity Fabric speeds to be able to handle the new I/O chip infrastructure including memory access over Infinity Fabric
- Ability to connect GPUs and do inter-GPU communication over the I/O chip and Infinity Fabric protocol so that one does not need PCIe switches or NVLink switches for chips on the same CPU. We covered the current challenges in: How Intel Xeon Changes Impacted Single Root Deep Learning Servers. This can be a game changer for GPU and FPGA accelerator systems.
- Socket compatible with current-generation AMD EPYC “Naples” platforms.
- Although not confirmed by AMD, we will state that most if not all systems will need a PCB re-spin to handle PCIe Gen4 signaling. So existing systems can get Rome with PCIe Gen3 but will require higher-quality PCB for PCIe Gen4.
- Claimed significant IPC improvements and twice the floating point performance per core.
- Incrementally improved security per core including new Spectre mitigations
This is a long list. We now have a fairly good idea about what the next-generation will offer. Cache sizes, fabric latencies, clock speeds, I/O chip performance, DDR4 speeds and other aspects have not been disclosed, so there is still a long way to go until we have a full picture. We have heard rumors of, and AMD hinted at the notion that with 7nm they would be able to get increased clock speeds as well.
As a reminder, here is the Intel Xeon Scalable “Cascade Lake-SP” overview:
With this information on both sides, let us do a quick comparison regarding what we know at this juncture based on the disclosures each company has made.
AMD EPYC 2 Rome v. Intel Xeon Scalable Cascade Lake-SP
We are going to focus on the Intel Xeon Scalable Cascade Lake-SP segment, not the recently announced -AP parts for reasons we will get to later in this article. For now, here is a tally of where we are:
There are a few major points here. First, AMD EPYC 2 Rome stacks up closer to a four socket Intel Xeon Scalable server. Even in many of those cases, AMD EPYC 2 Rome still has several points where it comes ahead. Intel will have Optane Persistent Memory and features like VNNI for inferencing. As we wrote in our Hot Chips 30 piece’s closing comments Intel needs more. These are technologies that are first generation parts that may become important in the future, but it will be some time before there is broad enough support to say that virtually every server will use both features every day.
We also wanted to note, the idea of an I/O hub is not new. If you go way back to the 2011 ServeTheHome archives, you can see evidence in our Supermicro X8DTH-6F Motherboard Review Dual IOH to see that dual I/O hubs were a feature. Intel still has a number of Lewisburg PCH options. That is important because it means the cost of an Intel platform also needs to include the PCH. The platform TDP is an extra 15W or more due to this PCH and costs $57 or more in trays. Unlike the northbridge/ southbridge setups of old, modern Intel platforms terminate PCIe (except for PCH PCIe lanes) and DDR4 on the CPU package, but Intel absolutely still has an I/O hub in their architecture, just sitting off-package on the motherboard.
What About Intel Cascade Lake-AP?
Intel wanted to get ahead of the news and offer its own solution, the Intel Cascade Lake-AP. Intel has entered the multi-chip package arena with this announcement. As we described covering that piece, it is truly interesting.
At the same time, there are some huge differences. First, Intel Cascade Lake-AP will almost certainly require a new socket. That means if you have standardized on the Dell EMC PowerEdge R740xd you can use Cascade Lake-SP in your mid-2019 server buys, but not Cascade Lake-AP. We have not heard broad vendor support for this, and so one option is that Intel could produce PCB with the Cascade Lake-AP chips attached and offer it directly. This is not something we can see mainstream server vendors pushing to their customers.
Intel offered the chips, but it did not offer a roadmap for the platform. As it stands, Intel’s announcement of Cascade Lake-AP is a one-off product until Cooper/ Ice Lake arrive. That is important for customer buy-in. Customers generally do not like single generation products.
In many ways, the Cascade Lake-AP dual socket architecture looks like a quad-socket Intel Xeon Scalable system, just in two packages. As a result, the numbers should be close to the above 4P numbers, except with an asterisk that Intel may be able to get up to 256 PCIe lanes from their platform. That would give more lanes to Cascade Lake-AP and bring bandwidth parity with Rome.
Until we learn more, we are not going to assume this is a mainstream product.
AMD knows it has a monster chip with the AMD EPYC 2 Rome generation. All of the system vendors that we talk to know that Rome is going to be big. We have seen server PMs get almost giddy as they discussed the platform.
On one hand, Intel is going to lose the top performing mainstream x86 architecture crown in 2019 by a substantial margin. At Hot Chips 30, we were told that the Intel Cascade Lake-SP specs would not change from what was presented. One can argue with the current Naples generation NUMA architecture and cores that Intel Xeon is faster. At the same time, so long as Rome’s I/O hub is reasonably implemented and performant, it will not be a close race as we start the second half of 2019. Intel may retain the per-core performance lead which is important for many licensed software packages. In terms of total performance, Intel is going to be behind.
The bigger question is whether this will matter. Looking forward to Zen 3, assuming that is socket compatible with the current generation AMD EPYC SP3 platforms, that is a big deal itself as it gives Intel a generation to catch up or leapfrog AMD with a newer/ bigger platform. Beyond that, AMD still needs to sell chips. AMD still does not have 10% of the market, let alone 5%. Rome will be a monster design, but the enterprise market is slow to move.
We will say this: if you are a VMware organization looking for hyper-converged setups to run Linux VMs, Rome will present a borderline obscene value especially when paired with 100/200Gbps networking and PCIe Gen4 NVMe SSDs.
The next year or two in the data center is going to be really fun. Stay tuned to STH as we cover this exciting space.