This is annotated to the best of my knowledge. I am not making any guarantee that it is 100% accurate, however I am reasonably confident it is accurate. Please do your own research, or add a disclaimer before citing my diagram. Thanks. :D
Processor Diagram - Hawaii
Hawaii (XT/PRO) (290 series) and Grenada (XT/PRO) (390 series) should be identical to the naked eye. Grenada (to my understanding) is the exact same silicon with improved binning techniques and increased auxiliary voltage to allow for higher speed GDDR5.
You can click on the image to zoom in and scroll on it. (die shot credit: Fritzchen Fritz Incidentally, this particular die shot is from my own R9 290, which I sent to Fritzchen Fritz :3).
1) Compute Unit
(GCN 1.1) The processing core of the GPU, this is where the numbers are 'crunched'. The Compute Unit of this Hawaii chip contains 64 individual number crunching pipelines - known as 'Stream Processors'. These 64 pipelines are arranged into 4 groups known as Vector Units or SIMD Units. Each group has 16 pipelines that work together to perform calculations on 16 individual 32-bit Floating Point numbers, or a lower precision Integer number (on this chip up to 24-bit Integer at lower speed than FP32). These "Vector Units" of 16 pipelines each work on the 16 different numbers, (inputs) but perform the same instruction/calculation on each of them, resulting in 16 different output numbers. This is called "Single Instruction Multiple Data" and in 3D graphics it is used to work on vector matrixes containing the co-ordinates of vertices in 3D space, and manipulate the object such as rotating around it. The "Vector Units" can also work on compute tasks that benefit from this type of massively paralell computational power. Each CU also contains an private L1 cache 16 KiB in size, and a Local Data Share that is 64 KiB in size.
2) Shader Engine
A Shader Engine is a collection of GPU Processor Cores, or Compute Units. These CU are arranged together in groups of 11 on this Hawaii Graphics Processor, and are wired into a Geometry Processor and Raster Engine from the front-end. When shading an image the Scheduler tries to balance the shading load evenly between all four shader engines.
3) 8 x (2 x 32-bit) (512-bit) GDDR5 PHY
This is a physical connection to the traces on the PCB around the GPU chip that connect with its external memory packages. Each individual block visible here on this chip is a Physical connection to two GDDR5 chips, thus represents 64-bits of memory interface. Visible on Hawaii are 8 distinct blocks, so that gives you the large 512-bit aggregated memory interface.
This block contains many different logic circuits. Within this highlighted area is the GPU Command Processor; a central logic block that assignes and distributes work to the various Processing Cores and other hardware elements. Here is also the Asynchronous Compute Engines (ACE). Also within this area is a quartet of Geometry Processors, each with a Tessellator and Raster Engine next to it. These Geometry Processors are responsible for setting up primitives like triangles and vertices before they are shaded by the Compute Units. Transformation of the viewport window is likely also performed here. Raster Engines are responsible for taking a primitive and turning it into a pixel-grid of information that can be sent to the ROP to have its final colour determined on the screen. Global Data Share is a cache of information that is shared globally across the GPU.
5) Render Back-End (ROP)
The Render Back-End is one of the final stages in the graphics pipeline. This block is tasked with turning all the crunched 3D data into a final colour that represents a pixel to be displayed on the screen. The blocks highlighted here contain two Render Back-end blocks (each RBE is made up of two distinct logic blocks each,m so the highlighted two RBEs have four distinct blocks between them). Each RBE individually can work with 4 Pixels per clock (4 ROPs). Hawaii has 16 of these RBE blocks for a total of 64 Pixels per clock (64 ROPs)
6) L2 Cache Partition
Visible here is a large amount of SRAM (green blocks/boxes on the silicon in this shot). This is a very fast on-chip memory that acts as a cache to the GPU before going to its main external Graphics Memory. This block represents a partition of the L2 Cache, the full Hawaii chip has 1024 KiB of L2 cache.
7. Memory Controllers (?)
(60% Sure) These logic circuits are responsible for handling memory accesses to and from the external GDDR5 memory chips. They connect directly to the GDDR5 PHY blocks and the Render Back-Ends, which are typically very bandwidth heavy in their operations.
8. Display PHY
This is a physical connection to the external displays that output pixel information to a monitor that is connected to the video card.
9. PCI-E PHY
(90% Sure). This is a physical connection to the PCI-E interface of the Video card. It is via these wires that the GPU will communicate with the rest of the system, such as recieving frames ready to be rendered, from the CPU, in a video game.
10. Shared CU Cache (2 x 4 CU / 1 x 3 CU shared)
A Fast, small, on-chip memory block that is shared between Adjacent Compute Units. On Hawaii, each Shader Engine contains 11 CU, and these CU are arranged into three groups. Two of these groups have 4 CU and one has 3, each group shares one of these caches.
11. Display / Video Engine (?)
(50% Sure). Within or around the highlighted area is the GPU's video engine and display controllers. This section contains dedicated fixed-function blocks that output signals to the display PHY for standards like HDMI or DisplayPort. Also within these blocks is fixed-function logic responsible for video encoding and decoding in hardware sich as AVC (H.264).