Lazure2.wordpress.com

Altera will use Intel Custom Foundry’s 14 nm Tri-Gate (FinFET) process services to produce its new high-end SoC FPGA wit...

2013-11-01

With Stratix® 10 high-end and Arria® 10 mid-range FPGA and SoC FPGA products Altera wants to surge ahead of Xilinx in critical infrastructure—such as wireless remote radio units (RRUs), 100G/400G wireline channel (line) cards and data centers—as well as military, medical and broadcast scenarios by relying on ARM Cortex-A53 IP (Intellectual Property) and Intel Custom Foundry’s 14 nm Tri-Gate (FinFET) process services for Stratix 10, and ARM Cortex-A9 IP and TSMC 20 nm 20SoC process for Arria 10 with OpenCL for FPGAs capability for both. It will also be possible to begin designs with the Arria 10 portfolio of 20 nm FPGA devices, and then take advantage of pin-for-pin design migration pathways from Arria 10 FPGA and SoC products to Stratix 10 FPGA and SoC products as they become available.

This was my conclusion when the news came out that Altera Announces Quad-Core 64-bit ARM Cortex-A53 for Stratix 10 SoCs [press release, Oct 29, 2013] and then I answered three questions for myself, followed by understanding a little bit more deeply two other issues as well:

Why FPGAs? Why more FPGAs?

Why SoC FPGAs?

Why ARM with FPGA on the Intel Tri-Gate (FinFET) process, and why now?

OpenCL for FPGAs

Altera SoC FPGAs

To shed more light on the direction of breakthrough by Altera, here is additional introductory information from: Arria 10 Device Overview* [Altera, Sept 4, 2013]
*As there is no similar document yet for Stratix 10

Altera’s Arria® FPGAs and SoCs deliver optimal performance and power efficiency in the midrange. By using TSMC’s 20-nm process technology on a high-performance architecture, Arria 10 FPGAs and SoCs deliver higher performance than previous-generation high-end FPGAs while simultaneously reducing power by offering a comprehensive set of power-saving technologies. Altera’s Arria 10 family is reinventing the midrange.

Altera’s Arria 10 SoCs offer a second generation SoC product that both demonstrates a long-term commitment to the SoC product line and extends Altera’s leadership in programmable devices that feature the ARM-based hard processor system (HPS).

Important innovations in Arria 10 devices include:
- Enhanced core architecture delivering 60% higher performance than the previous generation midrange (15% higher performance than previous fastest high-end FPGAs)
- Integrated transceivers with short reach rates up to 28.05 Gbps and backplane capability up to 17.4 Gbps
- Hard PCI Express Gen3 intellectual property (IP) blocks
- Hard memory controllers and PHY up to 2666 Mbps
- Variable precision digital signal processing (DSP) blocks
- Fractional synthesis PLLs
- Up to 40% lower power compared to prior midrange FPGAs and up to 60% lower power compared to prior generation high-end FPGAs due to a comprehensive set of advanced power-saving features
- 2nd generation ARM® Cortex™-A9 hard processor system (HPS) for SoC variants
- Integrated 10GBASE-KR/40GBASE-KR4 Forward Error Correction (FEC)

Arria 10 devices are ideally suited for high performance, power-sensitive, midrange applications in such diverse markets as:
- Wireless—for channel and switch cards in remote radio heads and mobile backhaul
- Broadcast—for studio switches, servers and transport, videoconferencing, and pro audio/video
- Wireline—for 40G/100G muxponders and transponders, 100G line cards, bridging, and aggregation
- Compute and Storage—for flash cache, cloud computing servers, and server acceleration
- Medical—for diagnostic scanners and diagnostic imaging
- Military—for missile guidance and control, radar, electronic warfare, and secure communications

…

Target Markets for Arria 10 FPGAs and SoCs

Arria 10 devices meet the performance, power, and bandwidth requirements of next generation wireless infrastructure, broadcast, compute and storage, networking, and medical and military equipment.

By providing such a highly integrated device, Arria 10 FPGAs and SoCs significantly reduce BOM cost, form factor, and power consumption. Arria 10 devices allow you to differentiate your product through customization by implementing your intellectual property in both hardware and software.

For these applications, Arria 10 devices integrate both logic functions and processor functions in a highly integrated single device. The integrated ARM-based SoCs provide all the functionality of traditional FPGAs, eliminate the need for a local processor, and increase system performance by taking advantage of the tightly coupled high bandwidth interface between the core fabric and the hard processor system.

For Wireless infrastructure particularly remote radio unit, the industry has standardized onARM-based ASSPs and SoCs for several generations. ARM is widely recognized as the industry leader in low power solutions. At 20 nm, the Dual ARM Cortex MPCore provides the best power efficiency of any GHz class of process. When combined with Altera’s industry leading programmable technology, this provides an ideal platform to address the performance, power, and form factor requirements of wireless remote radio unit and small cell base stations.

For Wireline communication equipment such as access, metro, core,and transmission equipment where the FPGA performs critical functions such as protocol bridging, packet framing, aggregation, and I/O expansion, SoCs now offer all this as well as integrated intelligent controland link management, sometimes referred to as Operations, Administration, and Maintenance (OAM). OAM typically is software that executes when a link is established or fails during operation. The integrated ARM processor can also be used for statistics and error monitoring and minimize system downtime when a link is compromised or oversubscribed. Tight coupling of the processor and the data path (implemented in the core logic) saves time and results in significant savings in terms of operating expenses associated with system downtime and loss of quality of service.

For Compute and storage equipment, flash cache storage, the integrated ARM processor can be used to manage Flash sectors and improve overall life and reliability as well as offload the host processor and provide control for search and hardware acceleration functions for cloud storage equipment. The integrated ARM based HPS can configure the hard PCIe interfaces in PCIe root port configuration and also run link layers for SAS and SATA interfaces.

For Next generation Broadcast equipment, where “4K readiness” is the key technology driver, the integrated ARM processor subsystem eliminates the need for a local GHz class processor, which is commonly used for functions such as audio processing, video compression, video link management, and PCIe root port.

For Military applications, new security features such as Secure Boot, Encryption, and Authentication have been introduced for secure wireless and wireline communications, military radar, military intelligence equipment.

For Test and Medical applications, combining ARM HPS with support for high speed memory devices such as DDR4, and Hybrid Memory Cube (HMC) as well as high speed transceivers and embedded controllers such as PCIe Gen3, Arria 10 SoCs are ideal for next generation test and medical equipment.

Then you can also read The Next-Node Battle Begins – Altera Announces “Generation 10” [EE Journal, June 11, 2013] from I will quote here the following:

For the past three nodes or so, we’ve seen a back-and-forth battle between Altera and Xilinx. Most people think that Altera got the upper hand in 40/45nm products with their Stratix IV family. Two years later, Xilinx struck back hard at 28nm with Virtex-7. Now, it’s time for the “next” generation, and Altera is apparently ready to get the party started. The company has just announced their upcoming “Generation 10” FPGA families – and it looks like this node is gonna be a doozy!

as well as the ARMing a New Generation – Altera Announces Processor Architecture for Gen X [EE Journal, Oct 29, 2013] from which it is wort to quote the following:

Altera is currently in a race with archrival Xilinx, whose first FinFET FPGAs will be riding in on TSMC’s 16nm FinFET process. Which horse is faster? Intel is widely believed to have superior process technology and has already been shipping 22nm FinFET-based devices. Those points go to Intel. TSMC, on the other hand, has vastly more experience as a merchant fab and has announced that they are working closely with Xilinx to accelerate their FinFET program, in a blitz whose marketing name is “FinFAST.”

At this point, therefore, it is unclear who will be shipping first, (and, except for bragging rights between the two companies, probably few people care.) It is likely that we will not see production devices from either company before 2015, so we are definitely in “future” mode here. It is also unclear how the performance attributes of the two companies’ offerings will stack up. Altera has shown more of their hand thus far, and their predictions are impressive – up to four million LUT-4 equivalent 1GHz programmable fabric, 56Gbps SerDes, better power efficiency, tons-o-RAM – and a high-powered processing subsystem in the SoC version. What’s the processing subsystem look like? That’s why we are gathered here today.

There was speculation that the architecture might be other-than-ARM since the manufacturer is none-other-than-Intel. As far as we know, Intel hasn’t historically been too keen on manufacturing competing processor architectures. However, two other, more important market forces are at work in this situation. First, Altera has made a huge commitment to the ARM architecture with their current-generation SoC FPGAs. Getting their customers committed to the ARM/FPGA architecture and then jumping ship and forcing them to migrate after only one generation would be a major inconvenience, and it would be a big black eye for Altera. It would have been very unlikely that Altera would have inked the Intel deal knowing that they couldn’t continue their ARM commitment.

Second, Intel is obviously trying to make a go at it in the merchant fab business. If the company had a hard-and-fast policy of never manufacturing a chip with an ARM architecture on board, they’d be severely limiting their market. While Intel has already been building FPGAs for both Tabula and Achronix, getting Altera in their stable is a whole ‘nuther deal. Putting aside petty concerns about processor architecture is a small price to pay for better street cred in the merchant fab business.

1. Why FPGAs? Why more FPGAs?

As one of the greatest strengths of the FPGA is its ability to perform highly pipelined and complex algorithmic computations on the data brought onchip Altera says that we can do better with explicit parallelism on FPGAs than on GPUs:

The spectrum of software-programmable devices is now evolving significantly. The emphasis is shifting from automatically extracting instruction-level parallelism at run time to explicitly identifying thread-level parallelism at coding time. Highly parallel multicore devices are beginning to emerge with a general trend of containing multiple simpler processors where more of the transistors are dedicated to computation rather than caching and extraction of parallelism. These devices range from multicore CPUs, which commonly have 2, 4, or 8 cores, to GPUs consisting of hundreds of simple cores optimized for data-parallel computation. To achieve high performance on these multicore devices, the programmer must explicitly code their applications in a parallel fashion. Each core must be assigned work in such a way that all cores can cooperate to execute a particular computation. This is also exactly what FPGA designers do to create their high-level system architectures.
(Source: Implementing FPGA Design with the OpenCL Standard
(v. 2.0 Altera whitepaper, November 2012])

Field Programmable Gate Arrays

FPGAs are integrated circuits that can be configured repeatedly to perform an infinite number of functions. Low level operations such as bit masking, shifting, and addition are all configurable and can be assembled in any order. FPGAs achieve a high level of programmability by integrating combinations of lookup tables (LUTs), registers, on-chip memories, and arithmetic hardware (for example, digital signal processor (DSP) blocks) through a network of reconfigurable connections to implement computation pipelines. LUTs are responsible for implementing various logic functions. For example, reprogramming a LUT can change an operation from a bitwise AND logic function to a bit-wise XOR logic function.

The key benefit in using FPGAs for algorithm acceleration is that they support wide and heterogeneous pipelines. Each pipeline implemented in the FPGA fabric can be wide and unique. This characteristic is in contrast to many different types of processing units such as symmetric multiprocessors (SMPs), DSPs, and graphics processing units (GPUs). In these types of devices, parallelism is achieved by replicating the same generic computation hardware multiple times. In FPGAs, however, parallelism can be achieved by duplicating only the logic that will be exercised by your algorithm.

A processor implements an instruction set that limits the amount of work that can be performed each clock cycle. For example, most processors do not have a dedicated instruction that can execute the following C code:

E = ((((A + B) ^ C) & D) >> 2;

Without a dedicated instruction for this C code example, a CPU, DSP, or GPU must execute multiple instructions to perform the operation. You can configure an FPGA to perform a sequence of operations that implements the code above in a single clock cycle. An FPGA implementation connects specialized addition hardware with a LUT that performs the bit-wise XOR and AND operations. The device then leverages its programmable connections to perform a right shift by two bits without consuming any hardware resources. The result of this operation can be connected to subsequent operations to form complex pipelines. You may think of an FPGA as a hardware platform that can implement any instruction set that your software algorithm requires.

…

Altera SDK for OpenCL Pipeline Approach

The key difference between the pipeline generated by the Altera Offline Compiler (AOC) and a typical processor pipeline is that the FPGA pipeline is not limited to a statically defined set of pipeline stages or instruction set.
…
The custom pipeline structure provided by the AOC speeds up computation by allowing operations within a large number of threads to occur concurrently.
(Source: Altera SDK for OpenCL Optimization Guide
[for v. 13.0 SP1.0 by Altera, June 2013])

GPU and FPGA Design Methodology

GPUs are programmed using either Nvidia’s proprietary CUDA language, or an open standard OpenCL language. These languages are very similar in capability, with the biggest difference being that CUDA can only be used on Nvidia GPUs.

FPGAs are typically programmed using HDL languages Verilog or VHDL. Neither of these languages is well suited to supporting floating-point designs, although the latest versions do incorporate definition, though not necessarily synthesis, of floating-point numbers. For example, in System Verilog, a short real variable is analogue to an IEEE single (float), and real to an IEEE double.

OpenCL for FPGAs

OpenCL is familiar to GPU programmers. An OpenCL Compiler for FPGAs means that OpenCL code written for AMD or Nvidia GPUs can be compiled onto an FPGA. In addition, an OpenCL Compiler from Altera enables GPU programs to use FPGAs, without the necessity of developing the typical FPGA design skill set.

Using OpenCL with FPGAs offers several key advantages over GPUs. First, GPUs tend to be I/O limited. All input and output data must be passed by the host CPU through the PCI Express® (PCIe®) interface. The resulting delays can stall the GPU processing engines, resulting in lower performance

OpenCL Extensions for FPGAs

FPGAs are well known for their wide variety of high-bandwidth I/O capabilities. These capabilities allow data to stream in and out of the FPGA over Gigabit Ethernet (GbE), Serial RapidIO® (SRIO), or directly from analog-to-digital converters (ADCs) and digital-to-analog converters (DACs). Altera has defined a vendor-specific extension of the OpenCL standard to support streaming operations. …

FPGAs can also offer a much lower processing latency than a GPU, even independent of I/O bottlenecks. It is well known that GPUs must operate on many thousands of threads to perform efficiently, due to the extremely long latencies to and from memory and even between the many processing cores of the GPU. In effect, the GPU must operate many, many tasks to keep the processing cores from stalling as they await data, which results in very long latency for any given task.

The FPGA uses a “coarse-grained parallelism” architecture instead. It creates multiple optimized and parallel datapaths, each of which outputs one result per clock cycle. The number of instances of the datapath depends upon the FPGA resources, but is typically much less than the number of GPU cores. However, each datapath instance has a much higher throughput than a GPU core. The primary benefit of this approach is low latency, a critical performance advantage in many applications.

Another advantage of FPGAs is their much lower power consumption, resulting in dramatically lower GFLOPs/W. FPGA power measurements using development boards show 5-6 GFLOPs/W for algorithms such as Cholesky and QRD, and about 10 GFLOPs/W for simpler algorithms such as FFTs. GPU energy efficiency measurements are much hard to find, but using the GPU performance of 50 GFLOPs for Cholesky and a typical power consumption of 200 W, results in 0.25 GFLOPs/W, which is twenty times more power consumed per useful FLOPs.
(Source: Radar Processing: FPGAs or GPUs? (v. 2.0 Altera whitepaper, May 2013])

Altera also says that the need for ever-increasing bandwidth and flexibility drives the need for a breakthrough in capability:

The increased capabilities in smartphones and other portable devices are the reason for the dramatic leap in system performance that we will see in next-generation FPGAs. The explosion of mobility bandwidth requirements are putting a huge demand on the wireless, wired, and data center infrastructure capabilities. While the number of smartphones is growing at single digit percentage rates, the customers of these devices continue to drive more bandwidth with the ever-increasing smartphone capability. Much of this is due to the increased video content. In 2012, average smartphone data usage grew by 81 percent. Cisco expects mobile traffic to increase 66 percent per year through 2017 and two-thirds of all mobile traffic will be video content. At this time, mobile network speed is expected to increase by seven times and 4G networks to comprise 45 percent of all traffic (1) (see Figure 1).

A brief overview of three infrastructure applications below are examples of why hardware and software developers are looking to FPGAs to address their next-generation products bandwidth, performance, power, and cost goals.
■ Wireless remote radio units
■ 400G wireline channel cards
■ Data centers

Wireless Remote Radio Units

In the capital-intensive wireless infrastructure market, telecommunications operators desire to provide more bandwidth faster and cheaper. The faster these operators can do cost reductions, the more deployments they can do, the more area they can cover, and the faster they can serve customers—a huge advantage. The product strategy of these companies is to keep the datapath width the same and increase the clock frequency for as many generations as they can. Upcoming remote radio units will look for FPGAs to push close to 500 MHz of core performance for complex functions, such as implementing digital pre-distortion algorithms. This will preserve their investment in their radio architecture and allow them to cover a broader spectrum of radio frequency (RF) bandwidth. In doing so they look to have a better return on investment because less work needs to be done re-architecting a solution. Furthermore, their time-to-market advantage improves by getting these new products out faster. They must also lower their operating costs to drive cost per bit down because revenues per mobile subscriber grow at a far less rate than the data traffic per subscriber. Thus by not widening their datapath, and creating power efficient designs on smaller more power-efficient FPGAs, allows them to achieve this goal.

400G Channel Cards

Another driving force in improving FPGA performance is the need to upgrade the network communications infrastructure. Next-generation 400G versus existing 100G channel cards will dramatically push system capabilities. The bandwidth jump of four times in the next-generation systems is much greater than in previous iterations. Because the market for this is still new, companies cannot risk building ASICs or ASSPs to achieve this goal. Integration of multiple 56 gigabits per second (Gbps) and 28 Gbps transceiver solutions to accommodate this level of bandwidth is needed, but only a part of the solution. More and faster logic to accommodate this higher bandwidth is also required. However since the dimensions of the chassis do not change, the power envelope is limited. The network infrastructure cannot tolerate solutions where power increases at a linear rate with bandwidth capability. For packet processing and traffic management applications at 400G bandwidth at 600 million packets per second, scaling the data path width and frequency can relieve the data path processing function but cannot scale for control path processing such as scheduling. Therefore high performance in all aspects of device capability is required: processing, memory interfacing, IO interfaces, and others. FPGAs remain the most attractive solution, but companies will need investments in higher performance per watt architectures, transceivers, and process technology to address this large leap in capabilities and challenges.

Data Centers

All the data and video that are being pushed and downloaded from these new wireless deployments and transported through the new 400G packet processing infrastructure also needs to be stored and processed. Computations per watt and computations per dollar is a key metric in data centers. FPGA’s are increasingly used in the data center for data access, algorithm, and networking acceleration. Data center servers are bottlenecked getting access to data. The latest processors have more and more cores, but the bandwidth to external memory and data is not keeping pace with the increase in computing power. Many of these servers are running at average utilization rates and are well under peak processing power. These servers are good candidates for FPGA acceleration. Hardware acceleration through FPGAs becomes an attractive alternative to replacing these processors by focusing on the performance bottlenecks that software on processors cannot overcome.

Other applications are also looking to FPGAs to support their increased bandwidth requirements, such as video content providers moving to 4K video, cloud computing, and intelligence applications in defense. These applications face similar issues. (Source: Expect a Breakthrough Advantage in Next-Generation FPGAs (v. 1.0 Altera whitepaper, June 2013])

2. Why SoC FPGAs?

Altera’s Vision of Silicon Convergence: system solutions by merging coarse and fine grained programmable hardware [IEEE Computer Society Santa Clara Valley YouTube channel, recorded on Sept 10, 2012, published on June 10, 2013]

Recorded: Monday, September 10, 2012 Speaker: Ty Garibay, Altera Corporation. Event page: http://sites.ieee.org/scv-cs/archives/silicon-convergence-creating-system-solutions-by-merging-coarse-and-fine-grained-programming-model Slide deck: http://sites.ieee.org/scv-cs/files/2012/09/Garibay-IEEE-0910121.pdf While continuing semiconductor miniaturization enables ever more complex systems, the cost and complexity of system innovation becomes increasingly out of reach. A standard solution consisting of high performance processors, hardened peripherals, and a programmable logic fabric is ideal to address system integration challenges. Complementary to similar advances in software, a host of hardware design tools and high-level programming methodologies are also making system design more user-friendly. Together these industry advances allow design teams to flexibly implement any system to achieve the sweet spot of performance and power dissipation according to team capabilities. In his talk, Ty Garibay shares Altera’s view on silicon convergence, the integration of SoC and FPGA, and the direction the company is taking to increase system design efficiency through the use of high-level design languages and tools.

From the slide deck:

What Is a PLD?

A programmable logic device (PLD) is a type of semiconductor

Most semiconductors can be programmed only once to perform a specific function

PLDs are reprogrammable—functions can be changed or enhanced during development or after manufacturing

Flexibility Makes PLDs Lower Risk and Faster to
Design Than Other Types of Semiconductors

3. Why ARM with FPGA on the Intel Tri-Gate (FinFET) process, and why now?

Altera Announces Quad-Core 64-bit ARM Cortex-A53 for Stratix 10 SoCs [press release, Oct 29, 2013]

Manufactured on Intel’s 14 nm Tri-Gate Process, Altera Stratix® 10 SoCs Will Deliver Industry’s Most Versatile Heterogeneous Computing Platform

Altera Corporation (NASDAQ: ALTR) today announced that its Stratix 10 SoC devices, manufactured on Intel’s 14 nm Tri-Gate process, will incorporate a high-performance, quad-core 64-bit ARM Cortex™-A53 processor system, complementing the device’s floating-point digital signal processing (DSP) blocks and high-performance FPGA fabric. Coupled with Altera’s advanced system-level design tools, including OpenCL, this versatile heterogeneous computing platform will offer exceptional adaptability, performance, power efficiency and design productivity for a broad range of applications, including data center computing acceleration, radar systems and communications infrastructure.

The ARM Cortex-A53 processor, the first 64-bit processor used on a SoC FPGA, is an ideal fit for use in Stratix 10 SoCs due to its performance, power efficiency, data throughput and advanced features. The Cortex-A53 is among the most power efficient of ARM’s application-class processors, and when delivered on the 14 nm Tri-Gate process will achieve more than six times more data throughput compared to today’s highest performing SoC FPGAs. The Cortex-A53 also delivers important features, such as virtualization support, 256TB memory reach and error correction code (ECC) on L1 and L2 caches. Furthermore, the Cortex-A53 core can run in 32-bit mode, which will run Cortex-A9 operating systems and code unmodified, allowing a smooth upgrade path from Altera’s 28 nm and 20 nm SoC FPGAs.

“ARM is pleased to see Altera adopting the lowest power 64-bit architecture as an ideal complement to DSP and FPGA processing elements to create a cutting-edge heterogeneous computing platform,” said Tom Cronk, executive vice president and general manager, Processor Division, ARM. “The Cortex-A53 processor delivers industry-leading power efficiency and outstanding performance levels, and it is supported by the ARM ecosystem and its innovative software community.”

Leveraging Intel’s 14 nm Tri-Gate process and an enhanced high-performance architecture, Altera Stratix 10 SoCs will have a programmable-logic performance level of more than 1GHz; two times the core performance of current high-end 28 nm FPGAs.

“High-end networking and communications infrastructure are rapidly migrating toward heterogeneous computing architectures to achieve maximum system performance and power efficiency,” said Linley Gwennap, principal analyst at The Linley Group, a leading embedded research firm. “What Altera is doing with its Stratix 10 SoC, both in terms of silicon convergence and high-level design tool support, puts the company at the forefront of delivering heterogeneous computing platforms and positions them well to capitalize on myriad opportunities.”

By standardizing on ARM processors across its three-generation SoC portfolio, Altera will offer software compatibility and a common ARM ecosystem of tools and operating system support. Embedded developers will be able to accelerate debug cycles with Altera’s SoC Embedded Design Suite (EDS) featuring the ARM Development Studio 5 (DS-5™) Altera® Edition toolkit, the industry’s only FPGA-adaptive debug tool, as well as use Altera’s software development kit (SDK) for OpenCL to create heterogeneous implementations using the OpenCL high-level design language.

“With Stratix 10 SoCs, designers will have a versatile and powerful heterogeneous compute platform enabling them to innovate and get to market faster,” said Danny Biran, senior vice president, corporate strategy and marketing at Altera. “This will be very exciting for customers as converged silicon continues to be the best solution for complex, high-performance applications.”

About Altera

Altera® programmable solutions enable designers of electronic systems to rapidly and cost effectively innovate, differentiate and win in their markets. Altera offers FPGAs, SoCs, CPLDs, ASICs and complementary technologies, such as power management, to provide high-value solutions to customers worldwide. Follow Altera viaFacebook, Twitter, LinkedIn, Google+ and RSS, andsubscribe to product update emails and newsletters. altera.com

Altera to Build Next-Generation, High-Performance FPGAs on Intel’s 14 nm Tri-Gate Technology [alteracorp YouTube channel, March 11, 2013]

Industry leaders discuss the impact of the Altera and Intel foundry relationship and the future manufacture of Altera FPGAs on Intel’s 14 nm tri-gate transistor technology. These next-generation products, which target ultra-high-performance systems for military, wireline communications, cloud networking, and compute and storage applications, will enable breakthrough levels of performance and power efficiencies not otherwise possible.

From: Intel takes big step in chip foundry business [Reuters, Feb 25, 2013]

Altera Chief Executive John Daane told Reuters in a phone interview that Altera, which depends on communications infrastructure for about half of its business, is the only major programmable chipmaker that will have access to Intel’s plants.

“We are essentially getting access like an extra division of Intel. As soon as they’re making the technology available to their various groups to do design work, we’re getting the same,” he said.

Daane said Intel’s manufacturing technology will give Altera’s chips a several-year advantage against Xilinx, its main competitor in programmable chips. He said Altera would continue to make other chips with TSMC, its long-time foundry.

Altera to Build Next-Generation, High-Performance FPGAs on Intel’s 14 nm Tri-Gate Technology [press release, Feb 25, 2013]

Altera Corporation and Intel Corporation today announced that the companies have entered into an agreement for the future manufacture of Altera FPGAs on Intel’s 14 nm tri-gate transistor technology. These next-generation products, which target ultra high-performance systems for military, wireline communications, cloud networking, and compute and storage applications, will enable breakthrough levels of performance and power efficiencies not otherwise possible.

“Altera’s FPGAs using Intel 14 nm technology will enable customers to design with the most advanced, highest-performing FPGAs in the industry,” said John Daane, president, CEO and chairman of Altera. “In addition, Altera gains a tremendous competitive advantage at the high end in that we are the only major FPGA company with access to this technology.”

Altera’s next-generation products will now include 14 nm, in addition to previously announced 20 nm technologies, extending the company’s tailored product portfolio that meets myriad customer needs for performance, bandwidth and power efficiency across diverse end applications.

“We look forward to collaborating with Altera on manufacturing leading-edge FPGAs, leveraging Intel’s leadership in process technology,” said Brian Krzanich, chief operating officer, Intel. “Next-generation products from Altera require the highest performance and most power-efficient technology available, and Intel is well positioned to provide the most advanced offerings.”

Adding this world-class manufacturer to Altera’s strong foundation of leading-edge suppliers and partners furthers the company’s ability to deliver on the promise of silicon convergence; to integrate hardware and software programmability, microprocessors, digital signal processing, and ASIC capability into a single device; and deliver a more flexible and economical alternative to traditional ASICs and ASSPs.

Altera claims that only Intel’s 14 nm Tri-Gate Process offers a second generation of proven production technology:

Transistor Design Background

In 1947 the first transistor, a germanium ‘point-contact’ structure, was demonstrated at Bell Laboratories. Silicon was first used to produce bipolar transistors in 1954, but it was not until 1960 that the first silicon metal oxide semiconductor field-effect transistor (MOSFET) was built. The earliest MOSFETs were 2D planar devices with current flowing along the surface of the silicon under the gate. The basic structure of MOSFET devices has remained substantially unchanged for over 50 years.

Since the prediction or proclamation of Moore’s Law in 1965, many additional enhancements and improvements have been made to the manufacture and optimization of MOSFET technology in order to enshrine Moore’s Law in the vocabulary and product planning cycles of the semiconductor industry. In the last 10 years, the continued improvement in MOSFET performance and power has been achieved by breakthroughs in strained silicon, and High-K metal gate technology.

It was not until the publication of a paper by Digh Hisamoto and a team of other researchers at Hitachi Central Research Laboratory in 1991 that the potential for 3-D, or ‘wraparound’ gate transistor technology, to enhance MOSFET performance and eliminate short channel effects, was recognized. This paper called the proposed 3-D structure ‘depleted lean-channel transistor’, or DELTA(1). In 1997 the Defense Advanced Research Projects Agency (DARPA) awarded a contract to a research group at the University of California, Berkeley, to develop a deep sub-micron transistor based on the DELTA concept. One of the earliest publications resulting from this research in 1999 dubbed the device a ‘FinFET’ for the fin-like structure at the center of the transistor geometry(2).

Important Turning Point in Transistor Technology

Continued optimization and manufacturability studies on 3-D transistor structures continued at research and development organizations in leading semiconductor companies. Some of the process and patent development has been published and publicly shared, and some development remained in corporate labs.

The research investment interests of the semiconductor industry are driven by the International Technology Roadmap for Semiconductors (ITRS), which is coordinated and published by a consortium of manufacturers, suppliers, and research institutes. The ITRS defines transistor technology requirements to achieve continued improvement in performance, power, and density along with options which should be explored to achieve the goals. The ITRS and its public documentation captures conclusions and recommendations regarding manufacturing capabilities like strained silicon and High-K metal gate, and now the use of 3-D transistor technologies to maintain the benefits of Moore’s law. Based on documents produced by the ITRS and an examination of academic papers and patent filings, research into 3-D transistor technologies has grown dramatically in the last decade.

Adoption and Research

Two important pronouncements occurred in the last two years that have propelled the 3-D transistor structure into the industry spotlight, and into a permanent place in the technology story of MOSFET transistors.

The first announcement was by Intel Corporation on 4th of May, 2011, about their Tri-Gate transistor design that had been selected for the design and manufacture of their 22 nm semiconductor products. This was preceded by a decade of research and development taking advantage of the work of Hisamoto and others in FinFET development and optimization. It represented both a solid acknowledgment of the feasibility and cost-effectiveness of the the Tri-Gate transistor structure in semiconductor production, as well as a continued declaration of leadership by Intel in semiconductor technology.

The second announcement was the publication of ITRS technology roadmaps, with contributions from many other semiconductor manufacturing companies that identified 3-D transistor technology as the primary enabler of all incremental semiconductor improvement beyond the 20 nm or 22 nm design node.

…

Intel’s Leadership in Transistor Technologies

In several public forums, including the Intel Developer’s Forums and investor’s conferences, Intel identifies where they have demonstrated technology leadership in a variety of advances that have sustained the pace of Moore’s Law. As shown in Figure 3, Intel has identified the number of years of production leadership they have achieved in bringing strained silicon and High-K metal gate technology to full production. In the case of 3-D Tri-Gate transistor technology, Intel estimates a lead of up to four years based on their production rollout of Tri-Gate technology at 22 nm in 2011.

According to former Intel CEO, Paul Otellini in their 16 April 2013 Earnings Call(8):

“In the first quarter [of 2013], we shipped our 100 millionth 22 nanometer [Tri-Gate] processor, using our revolutionary 3-D transistor technology, while the rest of the industry works to ship its first unit.”

Another leadership advantage that will be held by Intel in their rollout of 14 nm technology can be traced to their very public ‘Tick-Tock’ strategy in process and microarchitecture introduction. A ‘tick’ cycle of product introduction relies on the implementation of microarchitecture changes in their CPU products, followed by a ‘tock’ cycle of semiconductor process manufacturing geometry shrink. Intel is firmly committed to a full process shrink in their move from 22 nm to 14 nm; comparable semiconductor technology processes in development at other manufacturers have been less clear whether their process roadmaps include the benefits of a process shrink.

(Source: The Breakthrough Advantage for FPGAs with Tri-Gate Technology (v. 1.0 Altera whitepaper, June 2013])

Altera says beginning with 14 nm Tri-Gate technology, the highest performance FPGAs will simply be the ones built on demonstrably superior transistor technology:

Accessing the Benefits of Tri-Gate Technology Through Altera FPGAs

Taking advantage of the significant benefits of Intel’s Tri-Gate technology is only possible for users of Altera® high-density and high-performance FPGAs on the 14 nm technology process. This is the result of an exclusive manufacturing partnership between the two companies referenced in the introduction to this paper.

The substantial advantages of Tri-Gate silicon technologies will allow Altera to deliver previously unimaginable performance in FPGA and SoC products. This will include a historic doubling of core performance as compared to other high-end FPGAs, bringing FPGAs to the Gigahertz performance level. Overall active and static power numbers will reduce by 70 percent through a combination of process, architecture, and software advances.

Although the details and schedules of the 14 nm manufacturing process are not yet publicly available from Intel Corporation, Altera users can begin designs today that take advantage of the significant performance and power efficiency benefits of Tri-Gate technology in FPGAs. This is possible by beginning designs with the Arria® 10 portfolio of 20 nm FPGA devices. Users can then take advantage of pin-for-pin design migration pathways from Arria 10 FPGA and SoC products to Stratix® 10 FPGA and SoC products as they become available.

This allows you, as an FPGA user and system architect, to begin designing products that can accommodate both the Arria 10 and Stratix 10 product families with minimal changes, modifications, and reengineering. This will allow you to get products to market with the highest performance and lowest power FPGAs that leverage 20 nm process technology and power reduction techniques, then advance these same products to the previously unimaginable performance and power efficiency of Intel’s 14 nm Tri-Gate manufacturing process.
(Source: The Breakthrough Advantage for FPGAs with Tri-Gate Technology (v. 1.0 Altera whitepaper, June 2013])

Altera Announces Breakthrough Advantages with Generation 10 [press release, June 10, 2013]

Stratix 10 FPGAs and SoCs leverage Intel’s 14 nm Tri-Gate process and an enhanced architecture to deliver core performance two times higher than current high-end FPGAs, while enabling up to 70 percent power savings.

Arria 10 FPGAs and SoCs reinvent the midrange by simultaneously surpassing high-end FPGAs in performance while delivering 40 percent lower power than today’s midrange devices.

Altera Corporation (NASDAQ: ALTR) today introduced its Generation 10 FPGAs and SoCs, offering system developers breakthrough levels of performance and power efficiencies. Generation 10 devices are optimized based on process technology and architecture to deliver the industry’s highest performance and highest levels of system integration at the lowest power. Initial Generation 10 families include Arria® 10 and Stratix® 10 FPGAs and SoCs with embedded processors. Generation 10 devices leverage the most advanced process technologies in the industry, including Intel’s 14-nm Tri-Gate process and TSMC’s 20 nm process. Early access customers are currently using the Quartus® II software for Generation 10 product development.

“Our Generation 10 products will strengthen the penetration of programmable logic into new markets and applications and further accelerate the implementation of FPGAs into systems traditionally served by ASSPs and ASICs,” said Patrick Dorsey, senior director of product marketing at Altera. “The optimizations we made in our Generation 10 devices allow customers to develop highly customized solutions that dramatically increase system performance and system integration while lowering operating expenses.”

Delivering the Unimaginable with Stratix 10 FPGAs and SoCs

Stratix 10 FPGAs and SoCs are designed to enable the most advanced, highest performance applications in the communications, military, broadcast and compute and storage markets, while slashing system power. Leveraging Intel’s 14 nm Tri-Gate process and an enhanced high-performance architecture, Stratix 10 FPGAs and SoCs have an operating frequency over one gigahertz, 2X the core performance of current high-end 28 nm FPGAs. For high-performance systems that have the most strict power budgets, Stratix 10 devices allow customers to achieve up to a 70 percent reduction in power consumption at performance levels equivalent to the previous generation.

Altera is announcing the technology details of Stratix 10 FPGAs and SoCs today as part of the Generation 10 portfolio introduction, and will disclose more details on the product at a later date. Stratix 10 FPGAs and SoCs provide the industry’s highest performance and highest levels of system integration, including:

More than four million logic elements (LEs) on a single die

56-Gbps transceivers

More than 10-TeraFLOPs single-precision digital signal processing

A third-generation ultra-high-performance processor system

Multi-die 3D solutions capable of integrating SRAM, DRAM and ASICs

Reinventing the Midrange with Arria 10 FPGAs and SoCs

Arria 10 FPGAs and SoCs are the first device families to roll out as part of the Generation 10 portfolio. The device family sets a new bar for midrange programmable devices, delivering both the performance and capabilities of current high-end FPGAs at the lowest midrange power. Leveraging an enhanced architecture that is optimized for TSMC’s 20 nm process, Arria 10 FPGAs and SoCs deliver higher performance at up to 40 percent lower power compared to the previous device family.

Arria 10 devices offer more features and capabilities than today’s current high-end FPGAs, at 15 percent higher performance. Reflecting the trend toward silicon convergence, Arria 10 FPGAs and SoCs offer the highest degree of system integration available in midrange devices, including 1.15 million LEs, integrated hard intellectual property and a second-generation processor system that features a 1.5 GHz dual-core ARM® Cortex™-A9 processor. Arria 10 FPGAs and SoCs also provide 4X greater bandwidth compared to the current generation, including 28-Gbps transceivers, and 3X higher system performance, including 2666 Mbps DDR4 support and up to 15-Gbps Hybrid Memory Cube support.

Development Suite Delivers Breakthrough Productivity to Generation 10

Generation 10 devices are supported by Altera’s Quartus II development software and tools for higher level design flows that include a software development kit for OpenCL™, a SoC Embedded Design Suite and DSP Builder tool. This leading-edge development tool suite enables design teams to maximize productivity while making it easier for new design teams to adopt Generation 10 FPGAs and SoCs in their next-generation systems. The Quartus II software will continue to deliver the industry’s fastest compile times by providing Generation 10 FPGAs and SoCs an 8X improvement in compile times versus the previous generation. The substantial reduction in compile times is the result of leading-edge software algorithms that take advantage of modern multi-core computing technologies.

Availability

Early access customers are currently using the Quartus II software for development of Arria 10 FPGA and SoCs. Initial samples of Arria 10 devices will be available in early 2014. Altera will have 14 nm Stratix 10 FPGA <font style="font