The new NVIDIA Pascal architecture (10th series) has already made a name for itself on the market, taking the place of the extremely successful 9th series, thanks to the big leap in the performance of the new models. In order for it to become an “auxiliary” material and companion to our GeForce GTX 10xx reviews, we decided to create this article, in which we will focus on what we consider to be the key technologies and the architecture of the recently launched graphics cores that have been used in the creation of the GeForce GTX 10xx series.
We will also add a table with the new GPU models from this series and their specifications.
With the introduction of “Pascal”, NVIDIA also announced technologies supported by this new architecture. In this article, we will present to you the more interesting ones.
Simultaneous Multi-Projection (SMP) is a technology that allows the GPU core, in a calculation of the geometry, to recreate up to 16 projections from different angles from one point of view without a significant drop in performance. This option is hardware implemented as a part of the new PolyMorph Engine 4.0 graphics subsystem of the Pascal cores.
SMP could be very helpful to the people playing on three monitors. When we use several displays we see one projection in a 2D image. If we have one 2D image on three monitors, it will seem proper only if the monitors are aligned next to each other. But if we place them at an angle around us, the image will be damaged – the straight lines will bend at an angle as you can see in the image below. In order for this problem to be prevented the image should be rendered for each monitor which leads to a huge loss in performance.
Therefore, NVIDIA, along with Pascal, presented an option for 2D monitors called Perspective Surround. It uses the SMP ability to recreate different projections of the already computed geometry. Thanks to this, we have a different projection for each monitor (i.e. 3 projections) in relation to the angle under which it is placed and this contributes to the proper image of the scene.
Another SMP characteristic is its ability to create a geometry projection around а second viewpoint without having to calculate it twice.
The reason for the need of generating two points of view is virtual reality (VR). VR needs them – one for each eye. In order for them to be created without SMP, the geometry should be calculated twice for the left and right eye. Thanks to the SMP capability, the geometry for these viewpoints can be computed simultaneously with the NVIDIA option called Single Pass Stereo.
Perspective Surround and SMP’s VR are options whose support should be provided by software developers before they can be used by users.
Using the name of a prominent photographer, the Ansel technology offers a very different way of taking a screenshot during a gaming session. Instead of taking a screenshot from the perspective of the player in a standard resolution, Ansel allows gamers to capture the whole scene through a free camera in a much higher resolution that normal.
When activated, Ansel puts the particular game on pause and provides a free camera which can capture a snapshot from various positions, angles or distance until the perfect picture is achieved. The existence of photo-filters allows you to add filters before taking the photo. Or, after being taken, the screenshot could be exported in an OpenEXR format and afterward processed without quality loss using photo editing programs such as Adobe Photoshop.
Thanks to the CUDA cores in the NVIDIA GeForce GTX GPUs, the captured image can be with resolution up to 61 440 x 34 560 pixels and size of around 2 GB. As a result, it is with extremely high levels of detail which allows the cropped parts of it to be extremely high quality.
When recording such high resolution, we can take delight in the incredible levels of game details. Let’s take the screenshot from The Witcher 3: Wild Hunt as an example. There is yet another room with a book in it below Geralt of Rivia, who stands on the terrace. After zooming in the image in this area, we can read the words written in the book.
More high-resolution photos can be viewed here.
Ansel allows us to make a 360-degree panoramic image, which can be viewed on a VR display. You can look at more 360-degree screenshots here.
In order for Ansel to work, it should be supported by the particular game. Only 150 lines of code were needed for its implementation in The Witcher 3.
At the time of writing this article, the games that support Ansel are: The Witness, The Witcher 3, Mirror’s Edge Catalyst, ARK: Survival Evolved, Obduction, War Thunder and Conan Exiles.
The games, announced to support Ansel in the future, are: Тom Clancy’s The Division, Watch Dogs 2, Lawbreakers, Unreal Tournament, Paragon, Fortnite and No Man’s Sky.
The VRWorks Audio technology, as you can guess from the name, is oriented towards the world of the virtual reality. It uses NVIDIA OptiX ray-tracing to track the path of the sound, that spreads throughout the environment in real time, by reflecting precisely materiality, shape and size of the virtual objects. For example, if you are in a room in which there is no object, the sound, and the echo will be more powerful than in the same room but with objects that absorb sound waves. So far, there are no announced games that will use this technology.
NVIDIA PhysX for VR
The NVIDIA PhysX for VR technology allows the game engine to provide real visual sensation by keeping track of when the hand controller interacts with the virtual object. Thanks to this, all user interactions with the environment in the virtual world – whether it be an explosion or spreading a hand over water surface – feel like a real experience.
NVIDIA GPU Boost 3.0
Along with Pascal, NVIDIA announced a new version of its Boost technology, which allows the GPU to increase its clock speed in real time, according to its capacity, voltage, and temperature. This leads to better performance without user interference.
Ever since the Kepler announcement, NVIDIA has presented the use of voltage with specific values (points) which determine different operating voltages of the graphics cores and thus its different clock speeds. A GPU works according to the values from the resulting curve (from points of voltage), changing its clock speed based on the momentary voltage, load, and temperature. With the introduction of GPU Boost 3.0, programming on specific points of voltage is now possible by an overclock software from other manufacturers. Consequently, it is now possible to adjust the clock speed of the Pascal graphics cores dependent on each voltage point.
In GPU Boost 2.0 the only way for overclocking used to be by increasing the clock speed for all voltage points at the same time with the same value. This limits the highest stable overclock at the lowest points of the voltage/clock speed curve. If the graphics core can be overclocked to 120MHz at the lowest voltage point and 60MHz at the highest voltage point, then the highest stable overclock will be only 60MHz.
With the help of GPU Boost 3.0, we can correct the value of each point on the curve individually. This means that we can generate higher overclock values at the points in which it is possible and lower – where it is not. In all other equal circumstances, this should increase the graphics core performance in overclocking, because it changes its clock speed based on different voltage points. In other words, GPU Boost 3.0 is aiming to obtain the highest possible overclock along the whole voltage/clock speed curve.
In this section, we will focus on the Pascal architecture itself and the various graphics cores based on it.
All GeForce 10xx graphics cores are based on the compatibility with the DX12 Pascal architecture and are manufactured through a 14/16-nanometer FinFET manufacturing process. In their construction different configurations of GPCs (Graphics Processing Clusters) have been used, including a special Raster Engine and different number of SMs (Streaming Multiprocessors), and 36-bit memory controllers, each of which is connected to 8 ROPs (Raster Operations Pipelines) and 256 KB L2 (second level) cache. Each SM has 128 CUDA/Shader/Stream cores, 256 KB register file, 48 KB L1 (first level) cache memory, 96 KB block of shared memory and 8 TUs (Texture Units). A detailed block diagram is presented below for you to gain a clear understanding.
SM is one of the most important hardware units in a graphics core, almost all operations go through an SM at a certain point in the process of rendering. Each SM is connected to a PolyMorph Engine, which contains the new Simultaneous Multi-Projection module.
GP104-400-A1 (GeForce GTX 1080)
We begin with the currently most powerful graphics card from the NVIDIA GeForce 10xx series – GTX 1080. It is equipped with 8GB GDDR5X graphics memory, working at the effective 10 000 MHz. The standard operating frequencies of the graphics core are 1607 MHz base and 1733 MHz at Boost. GeForce GTX 1080 takes advantage of the full potential of the GP104 core, manufactured through a 16-nanometer FinFET process with an area of 314 mm2 and consists of 7.2 billion transistors, whose block-diagram we will analyze below.
GP104-400-A1 has four GPCs, eight 32-bit memory controllers, and respectively 64 ROPs. Having in mind that each memory controller is connected to a 256 KB L2 cache, we note that the presence of 2 MB L2 cache and 256-bit memory bus. Each GPC includes five SMs. All of this means that GeForce GTX 1080 has 2560 CUDA cores, 160 Tus, and 64 ROPs.
The fully-functioning GP104-400-А1 graphics core (GTX 1080) consists of:
• 2,560 CUDA/Shader/Stream processors
• 160 Texture Units
• 64 ROP units
• 2 MB L2 cache
• 256-bit Memory bus
GP104-200-A1 (GeForce GTX 1070)
GeForce GTX 1070 has an 8GB GDDR5 with an effective frequency of 8000 MHz and a GP104 graphics core generally working at 1506 MHz base and 1683 MHz Boost frequency. It has an area of 314 mm2, it is manufactured through a 16-nanometer FinFET process and contains 7.2 billion transistors. We will analyze below a detailed GP104-200-А1 core block-diagram, which is used by GTX 1070.
We can see in this image that GP104-200-А1 has one completely prohibited GPC in comparison to the GP104-400-А1 core, used in GTX 1080, and consists of three, instead of four, GPCs. Each GPC has 5 SMs. Each SM consists of 128 CUDA cores and 8 TUs. As a result, we get 1920 CUDA cores and 120 TUs. The rest of the core is unaltered. 64 ROPs, 2 MB L2 cache, and eight 32-bit memory controllers are available, which makes the bus 256-bit.
A partially disabled GP104-200-А1 graphics core (GTX 1070) consists of:
• 1,920 CUDA/Shader/Stream processors
• 120 Texture Units
• 64 ROP units
• 2 MB L2 cache
• 256-bit Memory bus
GP106-400/300-A1 (GeForce GTX 1060 6GB/3GB)
Unlike the GeForce GTX 1070 and the more powerful GeForce GTX 1080 – GeForce GTX 1060 has on board an entirely different GP106 graphics core. NVIDIA offers two GTX 1060 models with different GDDR5 video memory capacity, running at effective 8000 MHz and various versions of the graphics core. The model with 6 GB memory uses a GP106-400-A1, while the version with 3 GB memory is equipped with a GP106-300-A1, which has one SM less. The area of GP106 is 200 mm2, which is 57% less than GP104, and 4.4 billion transistors have been used for its construction. GeForce GTX 1060 6GB puts to good use the full potential of the GP106 core and that’s why we will analyze a detailed block-diagram of this graphics chip below.
The NVIDIA GP106 construction has 2 GPCs, each of which features 5 SMs. We know that each SM consists of 128 CUDA cores and 8 TUs, therefore GTX 1060 6GB (GP106-400-A1) has 1280 CUDA cores and 80 TUs while GTX 1060 3GB (GP106-300-A1) – 1152 CUDA cores and 72 TUs. The chip is also equipped with 48 ROPs, 1536 KB L2 cache and six 32-bit memory controllers, which makes the bus 192-bit.
The fully-functioning GP106-400-А1 graphics core (GTX 1060 6GB) consists of:
• 1,280 CUDA/Shader/Stream processors
• 80 Texture Units
• 48 ROP units
• 1.5 MB L2 cache
• 192-bit Memory bus
The partially disabled GP106-300-А1 graphics core (GTX 1060 3GB) consists of:
• 1,152 CUDA/Shader/Stream processors
• 72 Texture Units
• 48 ROP units
• 1.5 MB L2 cache
• 192-bit Memory bus
GP107-300/400-A1 (GeForce GTX 1050/Ti)
The latest announced and most budget members of the GeForce GTX 10xx family are the GTX 1050 and GTX 1050 Ti. Both models use a new GP107 graphics core. It is manufactured through a 14-nanometer FinFET process from 3.3 billion transistors with an area of 135 mm2. GTX 1050 Ti uses the full potential of the graphics core with GP107-400-A1 with standard operating frequencies 1290 MHz base and 1392 MHz at Boost. It comes with 4 GB DRR5 graphics memory running at effective 7000 MHz. On the other hand, GTX 1050 uses GP107-300-A1 graphics core, which is partially disabled and has 1 SM and 8 TUs less but runs at increased standard frequencies – 1354 MHz base and 1455 MHz Boost. Here GDDR5 graphics memory is also available running at effective 7000 MHz but with decreased capacity of 2 GB. We will analyze a thorough block-diagram of the full version of the GP107 core used in GTX 1050 Ti.
We can see from the image above that in contrast to the higher-end chips, here we have a different GPC’s configuration which consists of not 5 but 3 SMs. 2 GPCs are available which leads us to the conclusion that GP107-400-А1 (GTX 1050 Ti) has 768 CUDA cores and 48 TUs. From the written above we know that GP107-300-А1 (GTX 1050) has one SM and 8 TUs less, therefore it is made of 640 CUDA cores and 40 TUs. The remaining part of the core is the same for both modifications. 32 ROPs, 1MB L2 cache and four 32-bit memory controllers are available, which makes the bus 128-bit.
The fully functioning GP107-400-А1 graphics core (GTX 1050 Ti) consists of:
• 768 CUDA/Shader/Stream processors
• 48 Texture Units
• 32 ROP units
• 1 MB L2 cache
• 128-bit Memory bus
The partially disabled GP107-300-А1 graphics core (GTX 1050) consists of:
• 640 CUDA/Shader/Stream processors
• 40 Texture Units
• 32 ROP units
• 1 MB L2 cache
• 128-bit Memory bus
Models and specifications
We would like to include a table showing the major technical specifications of all available so far NVIDIA GTX 10xx graphics cards.
|Model||GTX 1080 Ti||GTX 1080||GTX 1070||GTX 1060 6 / 3 GB||GTX 1050 Ti / GTX 1050|
|Graphics core||GP102||GP104-400-A1||GP104-200-A1||GP106-400 / 300-A1||GP107-400 / 300-A1|
|Number of transistors||12 billion||7.2 billion||7.2 billion||4.4 billion||3.3 billion|
|Manufacturing process||16 nanometers||16 nanometers||16 nanometers||16 nanometers||14 nanometers|
|CUDA cores||3 584||2 560||1 920||1 280 / 1 152||768 / 640|
|SMs||28||20||15||10 / 9||6 / 5|
|Base core clock||1 480 MHz||1 607 MHz||1 506 MHz||1 506 MHz||1 290 / 1 354 MHz|
|Boost core clock||1 582 MHz||1 733 MHz||1 683 MHz||1 709 MHz||1 392 / 1 455 MHz|
|Effective memory frequency||11 000 MHz||10 000 MHz||8 000 MHz||8 000 MHz||7 000 MHz|
|Type and size of the memory||11 GB GDDR5X||8 GB GDDR5X||8 GB GDDR5||6 / 3 GB GDDR5||4 / 2 GB GDDR5|
|Memory bandwidth||484 GB/s||320 GB/s||256 GB/s||192 GB/s||112 GB/s|
|Floating-point calculations||11.3 TFLOPs||9.0 TFLOPS||6.45 TFLOPS||4.61 / 4.1 TFLOPS||2.2 / 1.9 TFLOPS|
|Maximum temperature||91 degrees||94 degrees||94 degrees||94 degrees||97 degrees|
|Maximum consumption||250 W||180 W||150 W||120 W||75 W|
|Recommended PSU||600 W||500 W||500 W||400 W||300 W|