Profile updated as of 26th August. 2019
NVIDIA GeForce GTX 1070 Ti
Here is my GPU profile for the GeForce GTX 1070 Ti, a graphics card released in late 2017 to fight off AMD's Radeon RX Vega 56, which solidly beat the original GTX 1070. The Ti variant features more active Streaming Multi-processors but retains the same GDDR5 memory sub system as the original 1070.
The card I owned was the ASUS "Turbo" blower. I really quite liked it actually.
Actual Silicon Die shot Image credit to Fritzchen Fritz, you can see his awesome work here.
Gallery
(Above) The silicon die of GP104-300 on a GTX 1070 Ti graphics card. It has 8x 1024 MB GDDR5 SDRAM chips making up a 256-bit interface.
(Above) Actual silicon die-shot of the GP104 GPU die, this is the GP104-300 variant, as used in GTX 1070 Ti, a single Streaming Multi-Processor is laser disabled, as annotated above. (Disclaimer: The SM/TPC here is to the best of my knowledge).
(Above) The architectural block-diagram for GP104-300. Note the disabled TPC/SM.
Graphics Card Information
Graphics Card: NVIDIA GeForce GTX 1070 Ti
Graphics Card Manufacturer: NVIDIA
Graphics Card Release Date: November 2, 2017
Graphics Card MSRP: $449 USD
Graphics Processor Codename: GP104-300
Graphics Processor Manufacturer: NVIDIA
Graphics Processor Implementation: Cut die
Graphics Interface: PCI-E 16x Gen3
Architecture: Pascal (Client)
Lithography Process: TSMC 16nm FinFET
Approximate die size: 314mm²
SashleyCat's GPU die Size Rating: mid-sized
Approximate Transistor Count: 7,200 Million
Approximate Transistor Density: 22.9 Million / Square Milimetre
GPU Features
Double-speed FP16 Shading: No (1:64 FP32)
Asynchronous Compute Capability: Limited (Fast Context Switching) *
DirectX Hardware Support: DX12.1 (FL 12_1)
Dedicated DXR Acelleration on chip: No
Variable-rate Shading: No
Adv. Geometry shading: No
Adv. Geometry shading (Programmable/DX12 Mesh Shaders): No
AI/ML Acceleration: No
Advanced Memory Management: No
Integer and Float Shader Co-execution: No
Tile-based Renderer: Yes
GPU Computing Resources
GPU Substructures: 4 Graphics Processing Clusters, 19 Texture Processing Clusters (20 Full Chip)
Graphics Cores: 19 Streaming Multi-processors (20 Full Chip)
Graphics Cores per Substructure: 1 per TPC, 3 x GPC with 5, 1 x GPC with 4
Total Stream Processors (ALU/Shaders): 2432 (2560 Full Chip)
Stream Processors per Graphics Core: 128
Graphics Core SIMD Structure: 4 x 32
Total Special Execution Units: 608 Special Function Units (640 Full Chip), 76 FP64 CUDA Cores (80 Full Chip), 19 FP16x2 CUDA Cores (20 Full Chip), 608 Load/Store Units (640 Full Chip))
Special Execution Units per Graphics Core: 32 Special Function Units, 4 FP64 CUDA Cores, 1 FP16x2 CUDA Core, 32 Load/Store Units
Total Texturing Units: 152 (160 Full Chip)
Texturing Units per Graphics Core: 8
Pixel Pipelines (ROPs): 64 (8 x ROP Partitions with 8 Pixels per clock)
Level 2 shared on-chip cache: 2048 KB
Geometry/Tessellation Processors: 19 (22 Full Chip)
Raster Engines: 4
GPU Memory Subsystem
Graphics Memory Type: GDDR5
Graphics Memory Standard Capacity: 8192 MB
Graphics Memory Composition: 8 x 1024 MB GDDR5 SDRAM Chips
Graphics Memory Access Granularity: 32-bit (4 bytes)
Graphics Memory Standard Clock Speed / Data Rate: 2000 MHz / 8000 MHz
Graphics Memory Full Interface Width: 256-bit (32 bytes per clock)
Graphics Memory Peak Memory Bandwidth: 256 GB/s
GPU Frequency and Peak performance
Graphics Engine Clock: 1683 MHz *
GPU Computing Power FP16: 127,908 Million operations per second with FMA
GPU Computing Power FP32: 8,186,112 Million operations per second with FMA
GPU Computing Power FP64: 255,816 Million operations per second with FMA
GPU Texturing Rate INT8: 255,816 Million texels per second
GPU Texturing Rate FP16: 255,816 Million texels per second
GPU Pixel Rate: 107,712 Million pixels per second
GPU Primitive Rate: 6,732 Million triangles per second *
GPU Thermal and Power
Standard Cooling Solution: Blower cooler with vapour chamber
Typical Board Power: 180 W
Maximum Board Power: 216 W
Maximum Allowed Junction Temperature (TJ Max): 94*C
Graphics Card description
GeForce GTX 1070 Ti launched in late 2017 to combat AMD's Radeon RX Vega 56, which handily beat the original GTX 1070. NVIDIA's response was to enable four more Streaming Multi-processors on the GP104 silicon, putting the GTX 1070 Ti really close to the GTX 1080 in core count with only one disabled SM from a full chip, but keeping the same regular GDDR5 (non X) memory sub system as the normal 1070. The result is a card that could take back the performance crown from RX Vega 56, but also cost 50 dollars more. In 2019 it is often seen that GTX 1070 Ti and RX Vega 56 are evenly matched and offer about the same performance on average, with the 1070 Ti consuming less power, but lacking advanced features such as doubled-speed FP16 shading, which the Vega 56's Vega 10 processor innately supports.
Graphics Card approximate 3D Performance
Sashleycat gaming performance rating (2019): Good for 1440p high settings 60 FPS
NVIDIA's GTX 1070 Ti was launched in late 2017 to beat AMD's RX Vega 56 graphics card, which was more performant than the original GTX 1070. On release the 1070 Ti managed to take back the performance crown in this segment from the Vega 56, often providing performance around the Vega 64 and only a bit behind the GTX 1080. In 2019, the Vega 56 somewhat closes this lead and the two cards are very evenly matched. Performance is comparable to an RTX 2060 of the 20 series and is well suited for maximum detail high refresh 1080p gaming. The card also can cope well with 2560x1440 resolution in many games.
Notes
Graphics Engine Clock
GPU Boost on Nvidia cards of this generation results in higher observed core clock speeds in actual games. The 1070 Ti will often run at upwards of 1800 MHz if power or thermal limits are not reached. The number here is the NVIDIA-spec boost rating. Please note all the peak performance figures are using this base spec number. Actual performance will be higher but it's impossible to exactly quantify what each card will boost to.
GPU Primitive Rate
Raw triangle output based on my understanding of the Raster Engines. PolyMorph engines attached to each TPC may have an effect on total triangles rastered.
Ayschronous Compute Capability
As far as I know Pascal GPUs are unable to execute graphics and compute tasks asychronously. However, I am listing it as "Limited" due to this architecture's ability to switch between both tasks very quickly and this allows Pascal to still do very well in frames requiring both types of workload. This is still effective for Pascal. To my knowledge Pascal would not have benefitted a lot, if at all, from true Asynchronous Compute Capability
Stream Processors per Graphics Core
If you count the very small number of Dedicated FP64 and FP16x2 Cores this would be higher but for simplicity's sake they are listed as special execution units not towards the Core's full ALU compliment (that are used in games).
Misc.
This bit is for my personal opinion on this Graphics card / Graphics processor
Sashleycat's Awesomeness Rating: OK