Mini Tech Babble #6: Navi Tessellation Performance.

Updated: Mar 16

This will be a quick post, as I just did some testing with my RX 5700 and the "Navi 10" (Pro) GPU that powers it. And the results are, interesting. But I have a theory on why it is this way, and I will offer a little bit of my thoughts on that. So without further ado let me explain what I did. Remember this is a Tech Babble ~

Okay, I didn't do much. I just ran TessMark to see how "Navi 10" does in this, very synthetic might I add, workload. And here is the result, I have re-tested the 290X with the 19.9.2 driver as a control (I will retest more cards later, also I no longer have my 570 or Radeon VII, working on getting another 570, but I am lazy). The result for the 290X did not change.

Graph Time!

Wait, what?

Okay, so you are probably thinking what I am thinking, right? Navi 10 actually has lower performance in raw tessellation than even an RX 570's Polaris 20 PRO chip, when all processors are locked to (or very close to) 1000 MHz. Okay, wait. Before we all get our pitchforks and turn up at AMD HQ demanding our Tessellation performance back, I have some thoughts on this subject so let's get those out of the way first.


Obviously. This processor design (RDNA) is very new, and the driver is very immature. This is always a possibility and we shouldn't rule it out. However, I do not actually think this is the cause. Anyway...

Okay so what gives?

"Navi 10" is an entirely new take on the Graphics Core Next Macro-Architecture. (please don't kill me, but it's true, it's still technically a GCN implementation). It's a new architecture of the GCN ISA but what I am saying here is a lot has changed. The Geometry engine is completely redesigned.

Just One Geometry Engine?

If you look at this slide, we can see just one "Geometry Processor" is present. "Navi 10" silicon has a "Centralised Geometry Processor" with 4 "Prim Units". I don't actually know the intracies (I think only AMD does) about these new units but from the "Navi 10" Whitepaper:

The primitive units assemble triangles from vertices and are also responsible for fixed-function tessellation. Each primitive unit has been enhanced and supports culling up to two primitives per clock, twice as fast as the prior generation. One primitive per clock is output to the rasterizer. The work distribution algorithm in the command processor has also been tuned to distribute vertices and tessellated polygons more evenly between the different shader arrays, boosting throughput for geometry

So, according to this the Primitive Units are actually performing the Tessellation, but why is the performance per clock (contrary to the claim on the slide) actually regressed from Polaris 20 when the number of "tessellation blocks" remains the same (4)? The Navi chip also has twice the L2 Cache, and significantly more memory bandwidth to play with, so this is an odd result.

Synthetics are not Games

My thought here, is that maybe AMD has taken a close look at what games (TessMark is not a game by the way) are actually doing and redistributed the Geometry Processor's assets to maximise performance in games, but potentially at the expense of some "Synthetic" performance.... Remember these all add up to the Transistor Budget. This is a very valid point, because "Navi 10" is performing very well in games so far. Another thought though...

Navi is a new architecture and it's not really designed to run at 1 GHz

This is actually very true. There are some processes within processors that get increasingly more effective at higher clock speeds, things like a Branch Predictor, for example. I know this isn't a CPU but the principle could largely be the same. As I have previously mentioned, the newer Radeon Graphics Processors (Vega and now Navi) are designed to run at much higher clock speeds. However...

It doesn't outperform Vega 20 at default clock speeds

I did test my Radeon VII in TessMark, but I can't remember what I did with the result. However, if we look at Anandtech's Review of the Radeon VII, we can see the result there, and it is in line with what I got (around 400 FPS). So I tested the Radeon RX 5700 at stock (this was running around 1700 MHz, about the same as the Radeon VII). Here's the result.

And here's the result from AnandTech's review of the Radeon VII. (Please note they do not specify the PostFX, AA or resolution but I assume is disabled and 720p for emphasis on Tessellation...)

So, it appears that "Navi 10" (Pro) actually has less synthetic tessellation performance than "Vega 10" (XL) and "Vega 20". Huh. The interesting result here for me is the Vega 64, which is working with a similar amount of bandwidth (448GBps vs 486GBps on the Vega).

One last thought, that I had just now, is that "Navi 10" may lose tessellation performance with disabled Compute Units. Since the Shaders Culling is happening (I think?) on the Shader Arrays. This would be similar to the effect that Nvidia GPUs have in Tessellation, since their engines (PolyMorph) are attached to the TPCs that are often turned off to bin lower performing dies.

Someone test for me! For SCIENCE!

However, I would really like someone who owns an RX 5700 XT to test these results for me. I would love it if you could do that, so uh, if you do have an RX 5700 XT and wanna test for Science! here's the settings I use for TessMark:

Important for both tests:

GPU Clock: 1000 MHz

AMD Driver: 19.9.2

AMD Tessellation settings to "Use Application Settings"

  • TessMark 0.3.0

  • Set 3

  • 1280x720

  • 64X (Insane)

  • Anti-alising Disabled

  • Post FX Disabled

For the AnandTech comparison:

  • TessMark 0.3.0

  • Set 4

  • 1280x720

  • 64X (Insane)

  • Anti-alising Disabled

  • Post FX Disabled

Thanks! Please contact me with your results so I can think about this some more. I probably won't reply but it will be very much appreciated!

I would also be interested in all kinds of GPUs, so if you have any recent (or not so recent) GPUs, feel free to run the bench but please if you are using a GCN or RDNA-based Radeon, the driver used is 19.9.2 and the clock speed of the core is 1000 MHz for the synthetic test.

Nvidia users welcome, would be interesting to see. Maybe I will compile a graph of the results. Maybe, but I can't promise as it's anxiety-provoking. Thanks for understanding.

Thanks for reading <3

©2020 by Sashleycat (Read the Legal stuff)