Sash Thoughts: Threadripper 2950X, Is it time to retire? (Update with 3900XT results).
Updated: Mar 7
(Update, see bottom of the post for 3900XT results).
I love Zen+. You probably know I love Zen+. Normally, I would go through all my blog posts and link you all the loving posts I made about Zen+ in this first paragraph, but I can't be bothered to do that, you can make use of that great feature called the search bar if you want to see how much I love Zen+. But, I'll save you the effort and say I love it a lot.
Because it was amazing value for money, efficient, feature-rich and built on the solid foundation of Zen1; representing AMD's refinement of the architecture that first inserted the metaphorical boot into Intel's backside. Of course Zen2 and 3 proceeded to break their feet off in said backside, but that's beside the point. Actually, it's not entirely beside the point as you will learn in a moment.
I love my Ryzen Threadripper 2950X. It is a 16-core behemoth with Quad Channel memory. I use this system as a Work and Art station. The main tasks are image editing in Paint Dot Net and GIMP, heavy 7Zip Compression of my Science Fiction universe assets for archival purposes, and last but not least, rendering LEGO spaceships with Stud.io photorealistic renderer. I then use them in my digital arts like this one. These are almost always made on the 2950X, bless his heart. Doing me proud. Uh, well...
I was having a nice discussion with a lovely lady that I talk to about processors and technology, and she made a comment that I initially scoffed at. That comment was
5800X can match 1950X in multi-core.
Well, of course I responded emotionally like the Zen/Zen+ fancatgirlboy that I am. You know, 2950X isn't that much above 1950X all things considered. Well it is, but still. It's close enough that I wanted to put my money where my mouth is (don't actually do this, money is filthy and covered in bacteria) and test the 2950X in some workloads that I do almost every day and assert the 16-core Zen+'s gigantic, throbbing, multi-threaded ePeen all over the puny 5800X with its lowly 8 cores. Well, you see, that was the plan.
But as is often the case, Sash's plans don't always work out as intended, or they change, or whatever. You get the message. What I am trying to say is, since my lovely friend (YOU STILL KNOW WHO YOU ARE!) facilitated my early acquisition of a Ryzen 7 5800X processor, I thought I would test it for myself! Yes! Surely there is no way the 5800X could get close or even match my fully tuned, 4 GHz all-core 2950X in these workloads that are relevant to me?
Uuh. Yeah, about that. I'mma get straight to the point here because I'm hungry and thirsty, and I will go eat some Cheese and drink a Glass of Pepsi Max shortly. The point is yes, the 5800X kind of embarrassed my 2950X in two of the three tests for workloads I do. Those two being Stud.io Photorealistic Render (CPU) and 7Zip De/compression at 128MB Dictionary Size.
So, let's see 7Zip first because this one surprised me a lot. Now, I am no expert in how 7Zip works, and how this workload scales with cores (some compression doesn't scale at all), but it's a realistic use for my high core count CPUs so here it is.
TR 2950X, 4 GHz all-core (QC, DR, 2DPC, 2666 MHz C16-18-18-36 UMA)
R7 5800X, 4.6-4.7 GHz all-core (Stock) (DC, SR, 2DPC, 3200 MHz C14-14-14-31)
So yeah. The 2950X maintains a very tiny lead in decompression but when you factor in it was using damn near twice the power and has twice the threads for that, it calls things into question. Things like, is my 2950X the best choice for my daily use of these workloads?
Now, as I said before, this is just a benchmark. I was originally going to just use this, but in fact, I am so salty about this that I am going to do an actual test literally right now, as I type this, compressing an actual archive that I do normally. You know, real world (god, I sound like Intel) workloads! For you reading this; it will be instant. Lucky you! I'm going to run that test now.
I have just finished running the test. For me, that was about 30 minutes. Mother came back and I still haven't had that Cheese or Pepsi. So I will try to wrap this up (lol). The results are worse for the 2950X, because my archive contains a lot of smaller, hard to compress files (there are even some .mp3 files in there - good luck getting less than 100% out of those with lossless archival compression), but that's the way it is for how I use these processors.
In a nutshell, the 5800X is not just faster, but it's a lot faster. It manages to compress the 5GB folder with LZMA2 128MB Dictionary Size in 4 minutes 44 seconds, while the 2950X manages it at 7 minutes 13 seconds. The 2950X starts off to a high performance lead, with the easily parallelisable portions of the compression; the 16-core flexes its 32-thread muscle during he initial compression of the archive, but when the compression algo falls back to a highly single-threaded load; the 5800X just flies ahead with that immense IPC and clock speed advantage per core. It wasn't even like the 5800X was a slouch in the threaded bit, either, but the 2950X's single core performance just isn't up to scratch here to help it claw back that huge MT advantage.
As I said before, my RAM configuration for the 2950X is very slow (I explained that in this post) but that is how I configured the system, whereas I would configure the 5800X with Dual Channel differently. Do you want a test with the 2950X in 1DPC mode with DR DIMMs at 2933 in Local Mode? Because you're not getting one because I can't be bothered and I'm not doing this for you :D. Besides, I found UMA to be better in these in my own testing.
Here's the video:
Again, this workload isn't scaling great with core count because Ray Tracing is often latency sensitive and branch-heavy, all areas where the faster, improved cores on the 5800X can shine. That said, it only manages to almost match the 2950X, but with half the cores.
Last up, we have the BASTION OF 2950X's AWESOMENESS! YES! Ahem, sorry. Anyway, yes, in my third and final (since I do not compute on the 2950X any more due to power reasons) use-case of my art station computer; editing really fucking big images. And by big, I mean 16K+ big, and I have plans to go bigger. This is where the 2950X's HEDT-class X399 platform and its 8 DIMM slots comes into play, as I have them all populated with 16GB DIMMs making 128GB of RAM. Slow, sure, but there is quite a lot of it.
That said, 64GB is possible with my DDR4 kits on teh 5800X or 3900XT, which is mostly adequate, so we will have to see how the 3900XT stacks up before I decide if 2950X gets an early retirement; because the 5800X just couldn't keep up with the Mandelbrot fractal generation at 16K - an effect I use as a base on a lot of my image layers (for example nebulas and star fields).
Here is the video:
Flex those 16-core 32-thread MUSCLES YEAH! Ahem. What I really meant to say is...
Imagine if I had a 5950X...
Update: Ryzen 9 3900XT 12-core processor results!
I finally got round to moving the 3900XT from a linux 24/7 compute server into a windows system that I can run my apps on (and crunch when I'm not doing that, of course). Anyway, I made videos but I CBA to upload them at this time, because I am tired and about to go to bed. So here are the numbers from the three tests above, for the 3900XT at stock.
Settings for CPU are: stock clocks, DDR4-3200, 2x8GB, C16-18-18-36 (Single Rank, 1 DIMM per Channel)
7Zip archive compression: 6 minutes, 37 seconds (*See Note below*)
Stud.io Lego ship render: 2 minutes, 48 seconds
Paint Dot Net Mandelbrot Fractal 16K Render: 1 minutes, 22 seconds
Uh, so funnily enough the Zen2 CPU here is slower than the 5800X in the 7Zip compression in my use but faster than the 2950X, and the studio lego ship, though the latter it comes very close to the 5800X (50% more threads though). The CPU was running at around 4.1 GHz all core in 7Zip and 3.9-4.2 GHz all core in Studio renderer. It actually dipped to lower clocks than the 2950X (which was locked to 4 GHz on all sixteen cores manually) due to the power limit. The IPC advantage and beefier L3 couldn't overcome running at 100 MHz lower speed and having 8 less threads. Still, a manually tuned 3900XT would likely beat out the 2950X here, but I do not have a heatsink/cooler big enough for it at the moment, so maybe I can mess with that later.
Single-threaded compression and Zen2's apparent lack of hitting peak turbo.
My 7Zip archive workload once again highlights the importance of single-core performance in compression at "Ultra" settings with LZMA2: the CPU load for significant portions of the workload was very light, just pegging a single, or two threads at 100%; here the 5800X has an enormous advantage even against the 4.7 GHz peak turbo 3900XT - having even more IPC than the Zen2 has over the Zen+, and a much more consistent turbo algorithm. An "issue" with Zen2 that I have encountered on all 3 of the Zen2 CPUs I have used in Windows (3700X, 3950X and 3900XT) is that they very rarely hit their peak Turbos in real workloads. I suspect windows scheduler has something to do with this, as you can almost hit them if you manually set an app's affinity to 1 or 2 cores.
Either way, even with a peak boost of 4.775 GHz on one core (in synthetic and manually setting affinity in certain things); the 3900XT was only sustaining up to 4.3 GHz on the cores in use for those parts of the 7Zip Benchmark; so it outmuscled the slower Zen+ cores at 4 GHz, with 300 MHz more frequency and higher IPC, but fell short of the much higher IPC and sustained 4.85 GHz turbo in light loads (5800X consistently hits that on all cores in light loads) of the Zen3 processor.
*NOTE! I placed a little note above abou the 3900XT's 7Zip result, because at this time I only had access to 16GB of memory for it. This is just not really enough for 128MB Dictionary size that I use and the 3900XT bounced off the page-file at 16GB usage a few times onto the Corsair Force MP510 480GB NVME, it is by no means a slow page file, but keep that in mind. I think the 3900XT would do a little better if it had 32 GB in 2 DIMM per channel.
Paint dot net and really, really big fractal generation on 3900XT
The 3900XT here comes in between the 2950X and 5800X, but comfortably closer to the 2950X. This workload really scales with cores as I saw the 2950X really pull ahead of the 5800X despite the -40% per-core disadvantage, so having +100% more cores helped a lot. The 3900XT still didn't beat the 2950X because it of course has 8 less threads, and ran at only slightly higher clock speed at stock than the tuned 2950X on all cores. Around 4.1 to 4.2 GHz. Combined with the reasonable IPC gains from Zen+ to Zen2, and 100-200 MHz more frequency, the 3900XT does do well considering the 4c 8t deficit but it's still not enough to pull ahead of the 16-core beast that is my 2950X.
Overall, the 3900XT doesn't secure a win in any of these tests but that isn't because Zen2 is bad, it is because of the circumstances. My tests are a wacky mixture of extremely threaded stuff like this, and some more single-core reliant stuff combined like the 7Zip test and clock speed reliant like the Ray Tracing. Ultimately, the clock/core count and IPC combo of the 3900XT just isn't the right fit to grab a win. That said, with the 16GB limiting it in 7Zip and the lack of any manual tuning - AND a much smaller heatsink than even the 5800X (The 3900XT was hitting 80C, vs ~72C on the 5800X), the 3900XT was handicapped anyway.
I will definitely do more tests on this processor in the future with some better cooling and more ram. Anyway, that concludes this test for now. I have decided that the 2950X will remain in my art station for the impressive PDN performance and reasonable Stud.io performance, and I will run all the compressions on the 5800X with those insane cores. The 3900XT will stay for WCG 24/7 with the 2 other 3900Xs, where its excellent performance / watt in extremely scalable workloads like distributed computing, is very much valued.