2016-03-04

Pascal will feature 4X the mixed precision performance, 2X  the performance per watt, 2.7X memory capacity & 3X the bandwidth of Maxwell.  Nvidia’s CEO went on to state that all in all Pascal is Maxwell times ten. All of this has just been revealed here at GTC. There’s a lot to digest here, so let’s break it down.



Nvidia states that pascal will be the company’s first high performance GPU to feature mixed precision floating point compute FP16. Which is essential for low power devices such as tablets and mobile phones. Mixed precision is also very beneficial from a power efficiency stand point for many compute applications which don’t strictly require higher precision FP32 or FP64 compute which would benefit greatly from this addition.

Nvidia : Pascal Is Maxwell Times 10 – Features Mixed Precision, 3D Memory and NV-Link Coming in 2016

Nvidia’s CEO went on to state that pascal has 10x of Maxwell’s performance and he arrived at this conclusion via what he calls “CEO math”. Obviously this was just a humorous way to impress the crowd at GTC 2015 and is based on what was described as “very rough estimates”.

The idea is that if we look at all the improvements coming up with Pascal compared to Maxwell, they will collectively add up to make it “roughly” 10 times more efficient at deep learning compute tasks. Pascal will feature 3x the memory bandwidth of Maxwell, 2x peak single precision compute performance and 2x the performance per watt.



Besides providing a very catchy claim that the press can use in their headlines for today’s announcement, these improvements should enable the architecture to theoretically be significantly faster than its predecessor, Maxwell, at deep-learning / artificial intelligence workloads.

Admittedly Nvidia concedes that it’s unrealistic to see anything like a 10X speed-up in the real-world, except in select high performance computing and super-computing case scenarios. Where getting rid of the massive communication over-head between the various processors and the Nvidia GPU accelerators may contribute greatly to reducing the total time and energy needed to complete the necessary work.

There are four hallmark technologies for the Pascal generation of GPUs. Namely HBM, mixed precision compute, NV-Link and the smaller, more power efficient TSMC 16nm FinFET manufacturing process. Each is very important in its own right and as such we’re going to break down everyone of these four separately.

Pascal To Be Nvidia’s First Graphics Architecture To Feature High Bandwidth Memory HBM

Stacked memory will debut on the green side with Pascal. HBM Gen2 more precisely, the second generation of the SK Hynix AMD co-developed high bandwidth  JEDEC memory standard.  The new memory will enable memory bandwidth to exceed 1 Terabyte/s which is 3X the bandwidth of the Titan X. The new memory standard will also allow for a huge increase in memory capacities, 2.7X the memory capacity of Maxwell to be precise. Which indicates that the new Pascal flagship will feature 32GB of video memory, a mind-bogglingly huge number.



We’ve already seen AMD take advantage ofHBM memory technology with its Fiji XT GPU. Which will feature 512GB/S of memory bandwidth, which is twice that of the GTX 980. AMD has also stated that it plans to use the second generation of this new memory technology in its Arctic Islands family of GPUs in 2016. So we’re likely to see both red and green rocking second generation stacked HBM next year.

HBM achieves this amazing improvement in memory bandwidth and capacity by employing a very wide through-silicon-via memory interface. Each HBM cube is connected to the GPU with a 1024bit wide memory bus. HBM modules actually operate at low frequencies compared to GDDR5 but thanks to the significantly wider memory interface they manage to be up to 9 times faster than standard GDDR5 memory modules.

We’ve already covered this revolutionary new memory technology exclusively and in-depth last year. HBM will quickly replace GDDR5 as the standard memory technology for high performance graphics solutions. It’s fair to say that HBM is the future.

Pascal Is Nvidia’s First Graphics Architecture To Deliver Half Precision Compute FP16 At Double The Rate Of Full Precision FP32

One of the more significant features that was revealed for Pascal was the addition of 16FP compute support, otherwise known as mixed precision compute or half precision compute. At this mode the accuracy of the result to any computational problem is significantly lower than the standard 32FP method, which is required for all major graphics programming interfaces in games and has been for more than a decade. This includes DirectX 12, 11, 10 and DX9 Shader model 3.0 which debuted almost a decade ago. This makes mixed precision mode unusuable for any modern gaming application.

However due to its very attractive power efficiency advantages over FP32 and FP64 it can be used in scenarios where a high degree of computational precision isn’t necessary. Which makes mixed precision computing especially useful on power limited mobile devices. Nvidia’s Maxwell GPU architecture feature in the GTX 900 series of GPUs is limited to FD32 operations, this in turn means that FP16 and FP32 operations are processed at the same rate by the GPU. However, adding the mixed precision capability in Pascal means that the architecture will now be able to process FP16 operations twice as quickly as FP32 operations. And as mentioned above this can be of great benefit in power limited, light compute scenarios.

Nvidia’s Proprietary High-Speed Platform Atomics Interconnect For Servers And Supercomputers – NV-Link

Pascal will also be the first Nvidia GPU to feature the company’s new NV-Link technology which Nvidia states is 5 to 12 times faster than PCIE 3.0.

NVLink is an energy-efficient, high-bandwidth communications channel that uses up to three times less energy to move data on the node at speeds 5-12 times conventional PCIe Gen3 x16. First available in the NVIDIA Pascal GPU architecture, NVLink enables fast communication between the CPU and the GPU, or between multiple GPUs. Figure 3: NVLink is a key building block in the compute node of Summit and Sierra supercomputers.

VOLTA GPU Featuring NVLINK and Stacked Memory NVLINK GPU high speed interconnect 80-200 GB/s 3D Stacked Memory 4x Higher Bandwidth (~1 TB/s) 3x Larger Capacity 4x More Energy Efficient per bit.

NVLink is a key technology in Summit’s and Sierra’s server node architecture, enabling IBM POWER CPUs and NVIDIA GPUs to access each other’s memory fast and seamlessly. From a programmer’s perspective, NVLink erases the visible distinctions of data separately attached to the CPU and the GPU by “merging” the memory systems of the CPU and the GPU with a high-speed interconnect. Because both CPU and GPU have their own memory controllers, the underlying memory systems can be optimized differently (the GPU’s for bandwidth, the CPU’s for latency) while still presenting as a unified memory system to both processors. NVLink offers two distinct benefits for HPC customers. First, it delivers improved application performance, simply by virtue of greatly increased bandwidth between elements of the node. Second, NVLink with Unified Memory technology allows developers to write code much more seamlessly and still achieve high performance. via NVIDIA News

#4 16nm manufacturing process : Pascal will the first Nvidia GPU to be built on TSMC’s 16nm FinFET manufacturing process. The new process promises to be significantly more power efficient and significantly more dense than 28nm. Which would enable Nvidia to build significantly more complex and powerful GPUs all the while significantly improving power efficiency.

TSMC’s 16FF+ (FinFET Plus) technology can provide above 65 percent higher speed, around 2 times the density, or 70 percent less power than its 28HPM technology. Comparing with 20SoC technology, 16FF+ provides extra 40% higher speed and 60% power saving. By leveraging the experience of 20SoC technology, TSMC 16FF+ shares the same metal backend process in order to quickly improve yield and demonstrate process maturity for time-to-market value.

Pascal is still scheduled for a 2016 release with Volta coming along sometime after that.

[2016 UPDATE] Nvidia’s Pascal : Everything We Know Right Now

We’ve learned last year that Nvidia’s flagship Pascal code named GP100 may have taped out on TSMC’s 16nm FinFET manufacturing process in June. Interestingly just shortly afterwards AMD announced that it had taped out two FinFET chips. It’s absolutely not a coincidence that both companies completed their FinFET designs at the same time. Both are pushing for a very aggressive time to market timetable to debut their next generation FinFET based GPUs this year.

What we know so far about Nvidia’s flagship Pascal GP100 GPU :

Pascal graphics architecture.

2x performance per watt estimated improvement over Maxwell.

To launch in 2016, purportedly the second half of the year.

DirectX 12 feature level 12_1 or higher.

Successor to the GM200 GPU found in the GTX Titan X and GTX 980 Ti.

Built on the 16nm FinFET manufacturing process from TSMC.

Allegedly has a total of 17 billion transistors, more than twice that of GM200.

Will feature four 4-Hi HBM2 stacks, for a total of 16GB of VRAM and 8-Hi stacks for up to 32GB for the professional compute SKUs.

Features a 4096-bit memory bus interface, same as AMD’s Fiji GPU power the Fury series.

Features NVLink (only compatible with next generation IBM PowerPC server processors)

Supports half precision FP16 compute at twice the rate of full precision FP32.

GPU Architecture

NVIDIA Fermi

NVIDIA Kepler

NVIDIA Maxwell

NVIDIA Pascal

GPU Process

40nm

28nm

28nm

16nm (TSMC FinFET)

Flagship Chip

GF110

GK210

GM200

GP100

GPU Design

SM (Streaming Multiprocessor)

SMX (Streaming Multiprocessor)

SMM (Streaming Multiprocessor Maxwell)

TBA

Maximum Transistors

3.00 Billion

7.08 Billion

8.00 Billion

Up to 17 Billion

Maximum Die Size

520mm2

561mm2

601mm2

TBA

Stream Processors Per Compute Unit

32 SPs

192 SPs

128 SPs

TBA

Maximum CUDA Cores

512 CCs (16 CUs)

2880 CCs (15 CUs)

3072 CCs (24 CUs)

TBA

Compute Performance

1.6 TFLOPs

5.1 TFLOPs

6.1 TFLOPs

12 TFLOPs

Maximum VRAM

1.5 GB GDDR5

6 GB GDDR5

12 GB GDDR5

32 GB HBM2

Maximum Bandwidth

192 GB/s

336 GB/s

336 GB/s

1 TB/s

Maximum TDP

244W

250W

250W

250W

Average Performance Increase over Predecessor

+45%

(GTX 580 Versus GTX 285)

+55%

(GTX Titan Black Versus GTX 580)

+30%

(GTX Titan X Versus GTX Titan Black)

TBA

Flagship GPU Price (Consumer Only)

$499 US

(GTX 580)

$999 US

(GTX Titan Black)

$999 US

(GTX Titan X)

TBA

Launch Year

2010 (GTX 580)

2014 (GTX Titan Black)

2015 (GTX Titan X)

2016

NVIDIA Volta GPUs, successors to Pascal, will arrive with IBM Power9 CPUs Enabled Supercomputers in 2017The technology targets GPU accelerated servers where the cross-chip communication is extremely bandwidth limited and a major system bottleneck. Nvidia states that NV-Link will be up to 5 to 12 times faster than traditional PCIE 3.0 making it a major step forward in platform atomics. Earlier this year Nvidia announced that IBM will be integrating this new interconnect into its upcoming PowerPC server CPUs. NVLink will debut with Nvidia’s Pascal in 2016 before it makes its way to Volta in 2018.

Unlike with Maxwell, Nvidia has laid major focus on compute and GPGPU acceleration with Pascal. The slew of new features and technologies that Nvidia will debut with Pascal emphasize this focus. Including the use of next generation stacked High Bandwidth Memory, high-speed NVLink GPU interconnect and the addition of mixed precision compute at double the rate of full precision compute to push perf/watt. We can’t wait to see Pascal in action later this year, but until then stay tuned for the latest.

GPU Family

AMD Polaris

NVIDIA Pascal

Flagship GPU

Greenland/Vega10

GP100

GPU Process

14nm FinFET

16nm FinFET

GPU Transistors

Up To 18 Billion

~17 Billion

Memory

Up to 32 GB HBM2

Up to 32 GB HBM2

Bandwidth

1 TB/s

1 TB/s

Graphics Architecture

Polaris ( GCN 4.0 )

Pascal

Predecessor

Fiji (Fury Series)

GM200 (900 Series)

The post Nvidia : Pascal Is 10x Maxwell, Launching in 2016 On 16nm – Features 3D Memory, NV-Link and Mixed Precision by Khalid Moammer appeared first on WCCFtech.

Show more