Profile updated as of 26th August. 2019

NVIDIA GeForce GTX 1070 Ti

Here is my GPU profile for the GeForce GTX 1070 Ti, a graphics card released in late 2017 to fight off AMD's Radeon RX Vega 56, which solidly beat the original GTX 1070. The Ti variant features more active Streaming Multi-processors but retains the same GDDR5 memory sub system as the original 1070.

The card I owned was the ASUS "Turbo" blower. I really quite liked it actually.

Actual Silicon Die shot Image credit to Fritzchen Fritz, you can see his awesome work here.

Gallery

GP104-300.jpg

(Above) The silicon die of GP104-300 on a GTX 1070 Ti graphics card. It has 8x 1024 MB GDDR5 SDRAM chips making up a 256-bit interface.

1070tidie.jpg

(Above) Actual silicon die-shot of the GP104 GPU die, this is the GP104-300 variant, as used in GTX 1070 Ti, a single Streaming Multi-Processor is laser disabled, as annotated above. (Disclaimer: The SM/TPC here is to the best of my knowledge).

1070tiblock.jpg

(Above) The architectural block-diagram for GP104-300. Note the disabled TPC/SM.

Graphics Card Information

Graphics Card: NVIDIA GeForce GTX 1070 Ti

Graphics Card Manufacturer: NVIDIA

Graphics Card Release Date: November 2, 2017

Graphics Card MSRP: $449 USD

Graphics Processor Codename: GP104-300

Graphics Processor Manufacturer: NVIDIA

Graphics Processor Implementation: Cut die

Graphics Interface: PCI-E 16x Gen3

Architecture: Pascal (Client)

Lithography Process: TSMC 16nm FinFET

Approximate die size: 314mm²

SashleyCat's GPU die Size Rating: mid-sized

Approximate Transistor Count: 7,200 Million

Approximate Transistor Density: 22.9 Million / Square Milimetre

 

GPU Features

Double-speed FP16 Shading: No (1:64 FP32)

Asynchronous Compute Capability: Limited (Fast Context Switching) *

DirectX Hardware Support: DX12.1 (FL 12_1)

Dedicated DXR Acelleration on chip: No

Variable-rate Shading: No

Adv. Geometry shading (Primitive/Mesh shaders): No

AI/ML Acceleration: No

Advanced Memory Management: No

Integer and Float Shader Co-execution: No

Tile-based Renderer: Yes

 

GPU Computing Resources

GPU Substructures: 4 Graphics Processing Clusters, 19 Texture Processing Clusters (20 Full Chip)

Graphics Cores: 19 Streaming Multi-processors (20 Full Chip)

Graphics Cores per Substructure: 1 per TPC, 3 x GPC with 5, 1 x GPC with 4

Total Stream Processors (ALU/Shaders): 2432 (2560 Full Chip)

Stream Processors per Graphics Core: 128

Graphics Core SIMD Structure: 4 x 32

Total Special Execution Units: 608 Special Function Units (640 Full Chip), 76 FP64 CUDA Cores (80 Full Chip), 19 FP16x2 CUDA Cores (20 Full Chip), 608 Load/Store Units (640 Full Chip))

Special Execution Units per Graphics Core: 32 Special Function Units, 4 FP64 CUDA Cores, 1 FP16x2 CUDA Core, 32 Load/Store Units

Total Texturing Units: 152 (160 Full Chip)

Texturing Units per Graphics Core: 8

Pixel Pipelines (ROPs): 64 (8 x ROP Partitions with 8 Pixels per clock)

Level 2 shared on-chip cache: 2048 KB

Geometry/Tessellation Processors: 19 (22 Full Chip)

Raster Engines: 4

 

GPU Memory Subsystem

Graphics Memory Type: GDDR5

Graphics Memory Standard Capacity: 8192 MB

Graphics Memory Composition: 8 x 1024 MB GDDR5 SDRAM Chips

Graphics Memory Access Granularity: 32-bit (4 bytes)

Graphics Memory Standard Clock Speed / Data Rate: 2000 MHz / 8000 MHz

Graphics Memory Full Interface Width: 256-bit (32 bytes per clock)

Graphics Memory Peak Memory Bandwidth: 256 GB/s

 

GPU Frequency and Peak performance

Graphics Engine Clock: 1683 MHz *

GPU Computing Power FP16: 127,908‬‬‬ Million operations per second with FMA

GPU Computing Power FP32: 8,186,112‬ Million operations per second with FMA

GPU Computing Power FP64: 255,816 Million operations per second with FMA

GPU Texturing Rate INT8: 255,816 Million texels per second

GPU Texturing Rate FP16: 255,816‬ Million texels per second

GPU Pixel Rate: 107,712 Million pixels per second

GPU Primitive Rate: 6,732‬ Million triangles per second *

 

GPU Thermal and Power

Standard Cooling Solution: Blower cooler with vapour chamber

Typical Board Power: 180 W

Maximum Board Power: 216 W

Maximum Allowed Junction Temperature (TJ Max): 94*C

 

 

Graphics Card description

GeForce GTX 1070 Ti launched in late 2017 to combat AMD's Radeon RX Vega 56, which handily beat the original GTX 1070. NVIDIA's response was to enable four more Streaming Multi-processors on the GP104 silicon, putting the GTX 1070 Ti really close to the GTX 1080 in core count with only one disabled SM from a full chip, but keeping the same regular GDDR5 (non X) memory sub system as the normal 1070. The result is a card that could take back the performance crown from RX Vega 56, but also cost 50 dollars more. In 2019 it is often seen that GTX 1070 Ti and RX Vega 56 are evenly matched and offer about the same performance on average, with the 1070 Ti consuming less power, but lacking advanced features such as doubled-speed FP16 shading, which the Vega 56's Vega 10 processor innately supports.

 

Graphics Card approximate 3D Performance

Sashleycat gaming performance rating (2019): Good for 1440p high settings 60 FPS

NVIDIA's GTX 1070 Ti was launched in late 2017 to beat AMD's RX Vega 56 graphics card, which was more performant than the original GTX 1070. On release the 1070 Ti managed to take back the performance crown in this segment from the Vega 56, often providing performance around the Vega 64 and only a bit behind the GTX 1080. In 2019, the Vega 56 somewhat closes this lead and the two cards are very evenly matched. Performance is comparable to an RTX 2060 of the 20 series and is well suited for maximum detail high refresh 1080p gaming. The card also can cope well with 2560x1440 resolution in many games.

 

Notes

Graphics Engine Clock

GPU Boost on Nvidia cards of this generation results in higher observed core clock speeds in actual games. The 1070 Ti will often run at upwards of 1800 MHz if power or thermal limits are not reached. The number here is the NVIDIA-spec boost rating. Please note all the peak performance figures are using this base spec number. Actual performance will be higher but it's impossible to exactly quantify what each card will boost to.

 

GPU Primitive Rate

Raw triangle output based on my understanding of the Raster Engines. PolyMorph engines attached to each TPC may have an effect on total triangles rastered.

 

Ayschronous Compute Capability

As far as I know Pascal GPUs are unable to execute graphics and compute tasks asychronously. However, I am listing it as "Limited" due to this architecture's ability to switch between both tasks very quickly and this allows Pascal to still do very well in frames requiring both types of workload. This is still effective for Pascal. To my knowledge Pascal would not have benefitted a lot, if at all, from true Asynchronous Compute Capability

Stream Processors per Graphics Core

If you count the very small number of Dedicated FP64 and FP16x2 Cores this would be higher but for simplicity's sake they are listed as special execution units not towards the Core's full ALU compliment (that are used in games).

Misc.

This bit is for my personal opinion on this Graphics card / Graphics processor

Sashleycat's Awesomeness Rating: OK

©2020 by Sashleycat (Read the Legal stuff)