Intel has officially announced its 3rd Generation Xeon Scalable family codenamed Cooper Lake-SP. Based on the 14nm architecture, the Cooper Lake-SP family expands upon the Cascade Lake family by offering higher clock speeds, more advanced AI technologies, and scalability all the way up to 8S configurations on the Cedar Island 647650 071 platform.
Intel’s 3rd Gen Xeon Scalable CPU Family Codenamed Cooper Lake-SP Officially Launched – Xeon Platinum 8380HL Leads The Pack With Its 28 Cores, 4.3 GHz Peak Clocks & 250W TDP
During the press briefing on the 3rd Gen Xeon family, Intel termed them as the most advanced CPU for AI-specific workloads. We will look at what’s new and what kind of performance advantages the Cooper Lake-SP family brings but first of all, let’s talk about the platform itself. The Cooper Lake-SP CPUs will be supported by the Cedar Island platform that makes use of the LGA 4189 socket (socket P+).
This is by far the largest LGA socket that Intel has made and will be featured on two platforms, the Cedar Island which I just mentioned above and the Whitley platform for Ice Lake-SP processors that Intel says will also be launching later this year with first shipments being made in mid of 2020. The Cedar Island platform supports 4-way and 8-way CPU support which will be interconnected with a UPI (Ultra Path Interconnect). The CPUs will be connected to the Intel C620A chipset through DMI & the chipset itself will feature up to 20 PCIe Gen 3 lanes, 10 USB 3.0 ports, and 14 SATA Gen 3 ports.
As for the CPUs, they will offer up to six-channel memory support in DDR4-3200 (1 DPC) or DDR4-2933 (2 DPC) modes. The Cooper Lake-SP processors will feature up to 48 PCIe Gen 3.0 lanes. A list of other features such as AVX-512 (Up To 2 FMA support), bfloat 16 for built-in AI acceleration and new Intel SST (Speed Select Technology) features which will allow users more control in optimizing processor performance depending on their workloads.
The Intel SST technology comes with two modes, Intel SST-CP (Core Power) which is designed for workloads that benefit from higher base frequencies on a subset of processor cores, and lower base frequencies on the remaining cores, all while maintaining max turbo frequencies across all cores. Then there’s the SST-TF (Turbo Frequency) which is designed for workloads that benefit from higher turbo frequencies on a subset of processor cores, and lower turbo frequencies on the remaining cores, all while maintaining base frequencies on all cores. There are a certain amount of SKUs that make use of this feature so let’s talk about SKUs now.
The Intel Xeon Platinum 8300, Xeon Gold 6300 and Xeon Gold 5300 Series CPUs
The Xeon Platinum 8300 CPUs support both 4S and 8S configurations. The flagship SKU is the Xeon Platinum 8380HL which features 28 cores, 56 threads, and a total cache of 38.50 MB. The CPU has a base frequency of 2.9 GHz and a boost frequency of 4.3 GHz while operating at 250W. There are 2 variants of this specific chip, the HL and the H. The HL variant features up to 4.5 TB capacity while the H variant features 1.12 TB memory capacities. The frequency optimized Xeon Platinum 8376HL and 8376H retain the same core configuration but ship with lower base frequencies of 2.6 GHz that results in a lower TDP of 205W.
The 4S only variants are part of the Xeon Gold 6300 and Xeon Gold 5300 family. The top variant is the Xeon 6348H which features 24 cores and 48 threads. The CPU has a base clock of 2.3 GHz and a boost clock of 4.2 GHz while supporting a total cache of 33 MB. The chip has a TDP of 165W. It’s the Xeon Gold 6328HL which is the only processor in the Xeon Gold lineup to support 4.5 TB capacities. It features 16 cores and 32 thread with speeds of 2.8 GHz base and 4.3 GHz boost at 165W. Of all the CPUs, there are only three variants that support Intel SST which include the Xeon Gold 6328HL, the Xeon Gold 6328H, and the Xeon Gold 5320H. Following is the complete list of 3rd Gen Xeon-SP CPUs:
Intel 3rd Gen Xeon Scalable Family Codenamed Cooper Lake-SP SKUs:
|CPU Name||Cores / Threads||Base Clock||Boost Clock (Single-Core)||Cache||Maximum Memory Capacity||Socket Support||TDP||Price|
|Intel Xeon Platinum 8380HL||28/56||2.9 GHz||4.3 GHz||38.50 MB||4.5 TB||4S/8S||250W||$13012 US|
|Intel Xeon Platinum 8380H||28/56||2.9 GHz||4.3 GHz||38.50 MB||1.12 TB||4S/8S||250W||$10009 US|
|Intel Xeon Platinum 8376HL||28/56||2.6 GHz||4.3 GHz||38.50 MB||4.5 TB||4S/8S||205W||$11722 US|
|Intel Xeon Platinum 8376H||28/56||2.6 GHz||4.3 GHz||38.50 MB||1.12 TB||4S/8S||205W||$8719 US|
|Intel Xeon Platinum 8354H||18/36||3.1 GHz||4.3 GHz||24.75 MB||1.12 TB||4S/8S||205W||$3500 US|
|Intel Xeon Platinum 8353H||18/36||2.5 GHz||3.8 GHz||24.75 MB||1.12 TB||4S/8S||150W||$3003 US|
|Intel Xeon Gold 6348H||24/48||2.3 GHz||4.2 GHz||33.00 MB||1.12 TB||4S Only||165W||$2700 US|
|Intel Xeon Gold 6328HL||16/32||2.8 GHz||4.3 GHz||22.00 MB||4.5 TB||4S Only||165W||$4799 US|
|Intel Xeon Gold 6328H||16/32||2.8 GHz||4.3 GHz||22.00 MB||1.12 TB||4S Only||165W||$1778 US|
|Intel Xeon Gold 5320H||20/40||2.4 GHz||4.3 GHz||27.50 MB||1.12 TB||4S Only||150W||$1555 US|
|Intel Xeon Gold 5318H||18/36||2.5 GHz||3.8 GHz||24.75 MB||1.12 TB||4S Only||150W||$1273 US|
In terms of performance, Intel highlights that the 3rd Gen Xeon Scalable family delivers a 90% average performance improvement over a 5-year-old Xeon platform and up to a 98% increase in database performance. With the included DL boost enhanced (bfloat 16) instructions, the Xeon Cooper Lake-SP family delivers 93% higher training performance versus Cascade Lake-SP and 90% higher inference performance versus the last-gen CPUs (FP32).
The Cooper Lake-SP CPUs deliver similar accuracy to FP32 at faster speeds and minimal software changes are required to support them. Intel expects over 100 optimized topologies for 3rd Gen Xeon SP CPUs which will further help increase the adoption rate of DL Boost. In Resnet-50, while DL Boost Enhanced provides a huge boost over last generation processors, so does AVX-512 which has around 30% gain in training and around 10% gain in inferencing performance.
Another key advantage of the Cedar Island family which as stated above is support for up to 8 CPU configurations which will result in up to 224 cores and 448 threads. That is almost twice the cores & threads of competing platforms. There is a magnitude of workloads which these CPUs are aimed at but we can dismiss the fact that while Intel acknowledges greater efficiency and lower TCO with its Cooper Lake-SP family, they still need to be compared with AMD’s 2nd Gen EYPC Rome lineup in terms of price, performance, and efficiency.
Currently, AMD’s EPYC CPUs demolish Intel in terms of performance per watt, a number of cores/threads, feature set, and total cost of operation with major players in the server segment switching their cloud datacenters to AMD’s EPYC CPUs.
With that said, Intel also plans to introduce its 1S/2S Whitley platform which will support 3rd Gen Xeon Scalable Ice Lake-SP CPUs. Intel says that Ice Lake-SP CPUs will deliver generational platform and technology advancements in 2020 & beyond for mainstream server use cases. This will be around the same time when AMD will be introducing its 3rd Gen EPYC family codenamed Milan which is going to utilize the Zen 3 CPU architecture to further leverage IPC & feature set over the existing Rome chips.
More Data Center Specific Announcements – SSD D7 Series, Optane Persistent Memory 200 & Stratix 10 NX FPGA
In addition to the 3rd Gen CPUs, Intel also announced a range of other products that are optimized for the Cedar Island platform. Starting off with the Optane Persistent Memory 200 series which are said to deliver a 25% increase in bandwidth with capacities of up to 512 GB. The Barlow Pass DIMM will operate between 12-15W TDP and come in three flavors, 128 GB, 256 GB, and 512 GB. The 2nd Gen Optane memory uses AES-256 encryption and optimized for productivity suites in data center.
The Optane DC Persistent memory provides an endurance (100% writes 256B) rates of up to 363 PBW at 2.3 Gbps, (100% writes 64B) of 91 PBW at 0.58 Gbps and (100% reads 256B) rates at 6.8 Gbps. All DIMMs support speeds of 2666, 2400, 2133, 1866 MT/s. Following is the full spec sheet for Intel’s 2nd Gen DC Persistent Memory:
Moving over to SSDs, Intel has also introduced it’s SSD D7-P5500 and SSD D7-P5600 series which makes use of 96-layer TLC NAND flash. The storage devices feature capacities of 7.68 TB, 3.84 TB, and 1.92 TB in the U.2 15mm form factor. Some of the main highlights include:
- Dynamic namespace management delivers the flexibility to enable more users and scale deployment.
- Additional security features like TCG Opal 2.0 and a built-in AES-XTS 256-bit encryption engine, which are required by some secure platforms.
- Enhanced SMART monitoring, which reports drive health status without disrupting I/O data flow using an in-band mechanism and out-of-band access.
- Telemetry 2.0 makes a wide range of stored data accessible and includes intelligent error tracking and logging. This increases the reliability of finding and mitigating issues and supports accelerated qualification cycles — all of which can result in increased IT efficiency.
- Optimized TRIM architecture now runs as a background process without interference to workloads, improving performance and QoS during concurrent TRIMs. The TRIM process is improved for high-density drives, with reduced write amplification that helps drives meet their endurance goal.
Intel SSD D7-P5500 & D7-P5600 Specs Sheet:
The D7-P5500 series delivers sequential read & write speeds of 7000/4300 MB/s, random 4KB read writes of 1 Million/130K IOPs, and an endurance rate of 1 DWPD (up to 14 PBW). The D7-P5600 series delivers sequential read & write speeds of 7000/4300 MB/s, random 4KB read writes of 1 Million/260KIOPs, and an endurance rate of 3 DWPD (up to 35 PBW). All drives are backed by a 5-year limited warranty.
Finally, Intel also announced it’s Stratix 10 NX FPGA which is based on the 14nm process and focuses on high-performance AI workloads. The FPGA makes use of an advanced AI Tensor block which delivers up to 15x more INT8 throughput than the Intel Stratix 10 FPGA, which comes with embedded memory hierarchy and HBM standard. The FPGA also features up to 57.8G PAM4 transceivers and hard Ethernet blocks for high efficiency. The AI tensor block introduced within the FPGA has base precisions of INT8 and INT4 and also supports FP16 & FP12 formats.
Intel additionally provides performance comparisons with the NVIDIA V100 Tensor Core GPU. In Natural Language processing, the FPGA is 2.3X faster in the BERT batch 1 workload, up to 9.5x faster in the LSTM batch 1 workload, and up to 3.8x faster in ResNet 50 batch 1 workload. Intel has said that the Stratix 10 NX FPGA silicon will be available later this year. Expect more details in the coming months.