Sash Rant: Some people don't understand Turing's advantages.

Okay, I am a bit triggered, so that means you get a Rant Post! How lucky you are, I mean that one person that reads my blog or randomly wandered here from the internet wasteland. :D


So, this misinformation comes from my favourite source! Disqus, of course. To be fair, VideoCardz's comment section has cultivated some excellent technology discussion about the hardware and industry, you just have to (like any public comment forum) wade through the BS and people who don't know what they're talking about.


Hell, I never even claimed to really "know what I'm talking about"; I still have a positively huge amount to learn on these subjects, but I am passionate about learning about them, and I feel I have a solid foundation of knowledge that is required for me to create a rant post like this one. So without further ado, I will take some comments and type why I flat-out disagree (and not just about opinions, I will argue that facts support my argument).





Pascal (10-series) has higher PPT (Perf Per Transistor) and PPA (Performance Per Area) than Turing (16/20-series): In a lot of '3D Gaming Workloads'.

So, let's start by making this very clear. Because it's a fact, Turing does indeed have more Transistors and larger die sizes relative to Pascal for a given amount of performance, in many 3D Gaming Workloads. That said, this is not a regression, because there are some things that we need to make clear concerning the architectures being compared here, and also; the extend of that 'advantage' the OP is claiming is erroded RE: TU106-based RTX 2060, due to said product featuring a pretty heavily cut-down processor.


So, firstly, RTX 2060 is based on TU106 with just over 10b transistors, this is a fact, and on paper, it doesn't look that great in many gaming workloads versus a processor like GP104 with just over 7b for similar performance.


This flawed assessment firstly doesn't take into consideration that RTX 2060's TU106 GPU is heavily cut back. That is, there are a lot of transistors in functional units in that silicon that are not doing anything at all; I.E, they are laser-disabled likely due to lithography defects, leakage, or just product segmentation reasons.


A 'fairer' (not fully, will come to that in a moment) comparison is to compare the full active TU106 die (RTX 2070 non SUPER), since you are using the full GP104 silicon in the GTX 1080. Running that same comparison, the TU106 does indeed have higher performance, but wait! Yes, it indeed has many more transistors, more so than the increase in performance. Which brings me to the next point in this rant.





Turing incorporates architectural advancements that increase transistor budget, but also flexibility, performance in certain workloads: Ray-Tracing.

This is something we need to understand before judging Turing in 'Gaems' versus Pascal. Firstly, Turing isn't actually built specifically for 3D graphics; it's an extremely capable, versatile architecture with strong compute elements, but those elements can also be leveraged to significantly improve performance in certain gaming situations.


The big elephant in the room, obviously, that everyone knows about; is that Turing has dedicated, fixed-function logic for Ray Tracing. These "RT Cores" are essentially ASIC blocks that vastly speed up the process of looking through a Bounding Volume Hierarchy table (BVH), essentially a list of geometry in a given scene, to determine where a Ray intersects with that geometry and the resultant pixel will need to be altered depending on how the light ray interacted with the geometry. This creates extremely realistic simulations of lighting effects. Ray-Tracing!


Those blocks are doing absolutely nothing in a 'normal 3D Game'. So that transistor bloat, isn't even being used. Consider the following:


What about GTX 1080 versus RTX 2060 with RT enabled?


I think you might find that the 2060 is quite a bit more powerful than the GTX 1080. Now, even though this is a huge performance increase when Turing's HW RT feature is being used, it might not actually fully explain the increased transistor count relative to % performance increase. There are further reasons to explain this, so I will maybe create a smaller sub-heading. Or whatever. It's a rant, don't judge me!



Turing's architecture trades PPA/PPT for higher flexibility, concurrent INT/FP, RT/AI acceleration, and higher performance per watt.


Turing's core design includes dediated hardware for Integer code, Floating point code, AI/ML processing and Ray Tracing. The fact that for each 'advertised' CUDA core on a Turing-based GPU, there are actually two seperate CUDA-cores, one for INT32 and one for FP32; furthermore, the SM has additional logic dedicated to Tensor cores (FP16 matrix processors) and Ray Tracing (BVH Traversal ASIC). Those are not being used in normal games, so it's not entirely accurate to judge PPA/PPT between architectures.


On the INT/FP subject, this improves shading efficiency, and very much so, in certain situations, by allowing Integer and FP shader code (obviously not dependent) to be excuted in parallel. With the same number of CUDA cores on Pasca, such code would need to wait for the CUDA core to finish doing INT or FP; it cannot do both.


This advantage varies from pretty significant, to essentially nil, depending on the game engine and the types of instructions used. In certain scenes, Turing can achieve aorund ~30% higher shading efficiency per CUDA core versus Pascal.


TU116 and TU117 do not have RT or Tensor cores, so why do they still have worse PPA/PPT to Pascal?


Where you even reading?These GPUS still use dedicated INT/FP pipes. Furthermore, the TU11x chips, instead of Tensor cores, have a dedicated array of FP16x2 Accumilators. That is dedicated logical pathways for doing 2xFP16 ops in 1 operation, 2:1 of FP32. That is used for Variable Rate Shading and certain other workloads.


Compare GTX 1660 Ti to GTX 1080 when VRS is fully in use, in a game issuing INT and FP concurrently, I think you'll see it a bit differently. Perf/watt did increase with Turing, on a processs that is essentially the same.


Consider that Turing might not have really been intended to be on TSMC's 12nmFFN.

This sub-heading is really interesting. It is highly likely that Turing's die sizes are much larger than NVidia intended. I heard something about Turing originally being slated for Samsung's 10nm process, with higher density. I cannot confirm this, but it's worth thinking about.