Are you ready for the new Intel Xeon E5-2699 V4 performance king? We have been tirelessly working 6 systems over the past few weeks to provide benchmark results to the STH community. Expect many more benchmarks to follow of subsequent chips. With the launch of the Intel Xeon E5-2600 “Broadwell-EP” parts, we have details of the biggest, baddest chip out there. This processor has a total of 22 cores and 44 threads and 55MB of L3 cache all fitting within a 145w TDP. Although Intel claims a roughly 5% IPC improvement from the new 14nm chips, it also reduces the base clock speed of the E5-2699 V4 by 100MHz below the E5-2699 V3 (18 core/ 36 thread). These were the first chips to arrive in our lab so they have been working tirelessly through benchmark loops for weeks in a row. We do not have list price on these parts yet, but do expect that they will be pricey. Of course, a few thousand per CPU may sound like a lot but these CPUs are often paired with $20,000 of RAM (or more) and software that can easily cost tens of thousands of dollars per machine so the overall chip cost at this end of the spectrum is reasonable so long as you are not looking for single threaded performance.
Our test platform was a standard EATX motherboard upgraded for Xeon E5 V4 support via a simple BIOS upgrade. We will have a review of the motherboard shortly, complete with power and thermal imaging as we normally do.
- CPU: Dual Intel Xeon E5-2699 V4
- Motherboard: Supermicro X10DRi
- Memory: 128GB – 8x Samsung 16GB DDR4 2400MHz ECC RDIMMs
- SSD: 1x Intel DC S3700 400GB
- Operating System: Ubuntu 14.04.3 LTS
As another note, we tried picking some interesting comparisons out of our data set. We did receive some extra launch support from Supermicro as they were able to get us the platform to run these tests on flashed with the latest BIOS ahead of the release. The Supermicro X10DRi is an EATX platform that launched with Haswell-EP.
With a simple BIOS upgrade, it is ready to support the Broadwell-EP chips.
During the transition to new Intel CPUs that work in existing sockets, we highly recommend ensuring you have the proper V2 BIOS before trying to use the Broadwell-EP chips with the LGA2011-3 platforms.
Intel Xeon E5-2699 V4 Benchmarks
For our testing we are using Linux-Bench scripts which help us see cross platform “least common denominator” results. We are using gcc due to its ubiquity as a default compiler. One can see details of each benchmark here. We are likely going to update the Linux-Bench in the near future with a few new tests as well as an even simpler to use/ faster revision, but for now, we are using our old Ubuntu 14.04.3 LTS version.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. We (finally) have a Linux kernel compile benchmark script that is consistent. Expect to see this functionality migrate into Linux-Bench soon (we are just awaiting the parser work on it.) The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make with every thread in the system. We are expressing results in terms of complies per hour to make the results easier to read.
As you can see, the dual Xeon E5-2699 V4 system is an absolute monster in this highly parallelized workload.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads.
The pair of Intel Xeon E5-2699 V4 chips are so fast that our “hard” benchmark is now in serious danger. What the Intel Xeon E5-2699 V4 is doing in a few seconds an Intel S1260 Centerton (Atom S1260) or a lower end Amazon AWS instance will take well over an hour to complete. These are the biggest dual socket chips you can use to consolidate virtual machine workloads as of this time.
7-zip is a widely used compression/ decompression program that works cross platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
Compression is a major operation we see in today’s workloads and is also highly threaded. We did have the Cavium ThunderX 48 core result omitted as we explained in our 96 core Cavium ThunderX benchmark piece. One can see the extra cores and IPC improvements perform well here.
NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here.
Here we find another benchmark where the dual Intel Xeon E5-2699 V4 configuration is crushing older processors.
Sysbench CPU test
Sysbench is another one of those widely used Linux benchmarks. We specifically are using the CPU test, not the OLTP test that we use for some storage testing.
We sorted this chart on the multi-threaded results. Practically that means that the blue bars representing single threaded performance would change the ranking. The single threaded results are bounded in a fairly tight range because there has been only modest clock speed and IPC improvements over the past few generations. Looking at the multi-threaded tests, we can see another strong showing by the Intel Xeon E5-2699 V4.
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
Moving to the verify results:
At this point we see another second place finish for the E5-2699 V4 however we do notice this is an area where the V4 architecture is pulling ahead more than we would have expected from pure core additions.
UnixBench Dhrystone 2 and Whetstone Benchmarks
Of course, these chips are not meant for heavy compute but we pick out the UnixBench 5.1.3 Dhrystone 2 and Whetstone results to show some of the raw performance they are capable of. UnixBench is widely used so it is a good comparison point.
Here are the single threaded workloads:
In single threaded workloads the E5-2699 V4 is fast as it gets its maximum turbo boost.
Now the E5 V4’s sweet spot, the multi-threaded workloads
Here we can see some exceptional performance again with the Intel Xeon E5-2699 V4 chips.
With a simple BIOS upgrade for current platforms the E5-2699 V4 gives a new set of capabilities as it can outpace older Quad Xeon E5-4600 processor machines. It can also hang with some quad Intel Xeon E7 V3 configurations which is spectacular. Intel’s strategy moving to 14nm is add some IPC improvements, add more cores and keep the power consumption and thermals very similar to the previous generation. We will have more on this in future pieces but the E5-2699 V4 will carry a high price tag but will also allow for greater VM host consolidation which makes sense from a TCO perspective. We have been working for a month on this launch and have lots more coming in the next few days and weeks so stay tuned. We did want to bring STH readers the top two SKUs on day 1.