But Sash, you're an AMD Fancatboygirl. How could you possibly defect to the Evil Green Dark Side? I'm afraid that answer is quite simple, and it's not entirely to do with AMD's driver being a bit... borked. (To be fair, they have fixed quite a few of the issues that were nagging me, but it's still not perfect). The reason is CUDA.
Now, CUDA isn't quite the "Big Bad Anti-Competitive Monster" that some people like to think it is (I acknowledge that NVIDIA isn't the most ethical of companies, however). CUDA is really successful for a couple of reasons that I am going to list here, and then explain why I am probably going to buy a GeForce graphics card in the foreseeable future.
CUDA's strength lies in the apps that support it - quite a lot more than OpenCL for Radeon GPUs (or Nvidia for that matter). Now, I know what you're thinking - "NVIDIA has just paid developers to not use OpenCL on AMD!", and while I can't confirm that isn't 100% the case (companies are assholes), I think the main reason for this is a twofold reason:
A) CUDA is easy to use. For want of a better term, it essentially "holds your hand" when developing code / programs to run on NVIDIA GPUs. For that reason alone, it's a lot more accessable than OpenCL, which is powerful - but requires more time and resources to develop for. Those two last points are key here: 99% of companies writing software don't want to spend a huge amount of time (and developer time = money) developing for.... and this leads me to the second reason.
.... a GPU brand that has rather low marketshare. It's sad, but unfortunately, it's true. NVIDIA seems to be much more entrenched in the software support/eco system with CUDA and AMD isn't going to get anywhere any time soon with consumer-focused/mainstream apps needing GPU support unless they have some viable alternative to CUDA. Now, AMD has ROCm, which is a start, but this is a sort of a vicious cycle that seems to put AMD into the niche area for those willing to invest the time into OpenCL on Radeons, or supercomputers that have specific contracts with AMD (but are likely using custom software).
Apps I use
So it comes down to my use-case, and two software apps I use both have CUDA-acelleration to speed up the process: Adobe Premiere Rush, which I use to create my various videos, and Stud.io 2.0 which I use to build and render my Lego models for my Science Fiction Universe. Now, the latter didn't seem to run that great on Turing; likely because the software hasn't been fully updated for the latest version of CUDA/architecture, but Adobe Premiere Rush will almost certainly have been.
I actually noticed Rush used the GPU for video editing when I put my GT 710 in my Threadripper 1920X PC (a post on this soon), as I am using my 2600X system for main use now with the TR as my Workstation. Anyway, the video was lagging a bit (but not hugely) and I noticed the 710's "Compute" engine was under full load as I edited the scene. Now, if a lowly potato GT 710...
... I'm not kidding. GT 710 is based on the tiny, 28nm, Kepler-architecture silicon known as 'GK208', this chip is like, 75mm2, and has two Kepler Streaming Multiprocessor neXts (SMX) and a 64-bit memory controller populated by only two DDR3 chips running at 1.8 Gbps. Uh, did I mention that one of those SMX is disabled? Yes, it only has 192 CUDA cores and the core runs at 952 MHz. This is the definition of Potato...
... can run Adobe Premiere Rush with my 1080P video editing without completely dying, what could, say a GTX 1650 SUPER do? Now you see what I'm getting at. This brings me to the final point...
What GPU to choose?
I am thinking, that my next "major" GPU purchase is probably not too close, as I am happy with the RX 5500 XT 8GB's performance (so much so that I even wrote a Tech Babble on the GPU chip, Navi 14...), so I will have to defer to when I "upgrade", but it might well be a GeForce. If it's going to be a 30-series, it's going to be less than £250 and it must have 8GB and full DX12U support. Please make this happen, Mr. Leather Jacket Man.
So, for the more short to medium term (months?) I am thinking about buying a budget GeForce card just for CUDA in those apps. Now, we can do some rough SCIENCE! to work out which GPUs would be 'faster' than my 1920X at 3.9 GHz, because SCIENCE! is fun, so:
(Disclaimer: the math is probably far off, but an FMA (MUL+ADD) takes 5 cycles on Zen1, but for the sake of simplicity, I'm going to go by peak numbers based on FPU vector width.
Each core has 2x 128 bit FMAC (arranged like 2x 128b MUL + 2x 128b ADD), so that is 256-bits of FMAC vector width in total. (Fused Multiply Accumilate).
That is enough register space to work on 8 single-precision (FP32) numbers in a single SIMD operation. Assuming peak utilisation by the core on 2 threads courtesy of SMT (sketchy but I never said it has to make perfect sense!), we get this:
12 cores x 8 FP32 numbers per clock x 2 operations (FMAC) x 3900 MHz = 748,800 MFLOPS
Now let's see, since AIDA64 has this benchmark pretty much already sat here for me...
Huh. I am actually quite surprised just how spot-on my 'eyeballed' math was. Anyway, it's quite obvious the tiny little iddle baby GT 710 with its single SMX is quite literallly embarrassed by my Big Chungus 12-core CPU flexing its SIMD Muscles, and it doesn't even have really big SIMD Muscles, either. At least not compared to Skylake or Zen2 core.... Or the monstrosity that is Skylake-SP's server core with dual AVX512 units.
Maybe I should find a way to disable GPU accelleration on APR until I get a GPU that isn't a Potato? OR maybe it uses both, because the CPU was being used a bit too.
Eh, anyway, I don't need to make the calculations for the GT710 as you can see above this tiny GPU is only good for two things; hardware video decode and display outputs. :D
I'm going to entertain the idea of getting a GTX 1650 SUPER: because this card is also based on TU116, which features Turing's fantastic NVENC block, which I can use with Handbrake for fast (and good!) video transcoding.