I must confess that until recently, I wasn’t well-versed in semiconductor physics or technology. While it’s rather easy to understand what a transistor does and some of the terminology thrown around, going deeper was tough. A great deal of the information on the internet is simply too cryptic to understand, even for those that want to learn. Seeing as how this site is all about the results of semiconductor physics and technology, this was the best place to share the knowledge that I've acquired.
Bandgap In Semiconductor / Pieter Kuiper
The simplest place to start is the materials. Silicon is incredibly important as a material in the industry because it’s a semiconductor. Of course, the name is self-explanatory, but there’s more to it. The key here is the band structure. Band structure refers to the “bands” of energy levels that form due to the sheer number of orbital states that can be occupied in molecules. Those that understand how electron orbitals work will point out that each energy level is discrete, but due to the sheer number of orbital configurations, a seemingly continuous distribution of energy can be seen. However, relatively large gaps still exist; known as a band gap, these are an energy state that an electron cannot occupy.
Band Filtering Diagram / Nanite
The question now is why this matters. The reason why it does matter is because of the Fermi level, or EF in the photo above. The Fermi level refers to the total chemical potential energy for a system of electrons at absolute zero. If the band lies above the Fermi level, electrons in the band can be delocalized from the atom, which means that it can carry current. This band is called a conduction band. If the band is below the Fermi level, this means that the electron is bound to an atom. This band would be a valence band.
Intrinsically, a semiconductor should have its Fermi level at the midpoint of the band gap. This is true of both insulators and intrinsic semiconductors, but a semiconductor’s band gap is extremely small. In fact, it’s small enough that electrons can jump the band gap as seen in the photo above because of thermal energy that will always exist in real world situations. While this property alone isn't particularly useful for digital logic, doping a semiconductor can have significant effects on the band structure. This means that the distribution of electrons in the valence band or conduction band will change.
This is where I have to introduce even more terminology. Depending on how the distribution is changed, a semiconductor is dubbed either a p-type or n-type semiconductor. If the band structure is such that free electrons are more easily generated, it becomes an n-type semiconductor. If the structure is such that electron “holes” are generated, it becomes a p-type semiconductor. In this case, electron holes refers to a place where an electron could exist, but doesn’t. Such a hole still conducts current. Look carefully at the p-type diagram once again. Because the valence band is so close to the Fermi level, electrons tend to stay in the valence band at lower orbitals. This is means that there are "holes" where an electron could be, which makes it a charge carrier. It's also worth noting that the diagram above isn't totally accurate, as doping normally introduces more bands instead of shifting their positions, but the concept is the same.
PN Junction Equilibrium / TheNoise / CC BY SA
What really makes things interesting is when a p-type and n-type semiconductor are placed next to each other. Because p-type semiconductors tend to have electron holes and n-type semiconductors tend to have an excess of electrons, there will be a diffusion of holes and electrons to try and equalize charge at the junction. Because of this diffusion process, the area around the junction becomes charged positively at the n-doped region and negatively at the p-doped region. This happens because the n-doped region is losing electrons, making the area positive while the p-doped region is losing holes, therefore becoming negative. The result is that an electric field is generated which opposes this diffusion and eventually reaches equilibrium. The area where this process occurs is called the depletion layer, as these ionized areas are stripped of charge carriers and therefore unable to carry current with the band structure that already exists.
PN Band / Saumitra R Mehrotra & Gerhard Klimeck / CC BY
This p-n junction is incredibly important in solid state electronics. In fact, the system we just described can be used as a diode, which is a device that only allows current to flow in a single direction. If a battery is connected with the positive terminal at the p-type semiconductor and the negative on the n-type semiconductor, the holes in the p-type and the electrons in n-type are all pushed towards the junction, which causes the depletion zone to shrink. This means that the electric field repelling the current decreases, and current is allowed to flow across the junction.
MOSFET Junction Structure / Brews Ohare / CC BY SA
The next inevitable question is how to make a transistor. While there are many other methods of making transistors, it’s best to focus on the one that is used in modern computer chips, namely the metal-oxide-semiconductor field-effect-transistor or MOSFET.
This is a relatively simple design, although there’s a great deal of complexity in its implementation. This type of transistor has four terminals: source, gate, drain, and body. More often than not, the body is shorted to source with a non-rectifying contact (ohmic contact) to eliminate body effect. We’ll get back to body effect eventually, but the important part here is to understand source, gate, and drain.
The names are somewhat self-explanatory, but source and drain are the points where the controlled current enters and leaves. The gate is the portion that controls the flow of the current, which means that it's either on or off depending on the voltage (bias) applied to the gate. It's important to keep in mind that current can flow from source to drain or drain to source depending on the type of MOSFET.
MOSFET Functioning Body / Biezl / CC BY SA
There’s more to it than just three terminals though. The actual structure of the MOSFET is critical to understanding why technologies such as HKMG and FinFET exist. In the case of an n-type MOSFET, the source and drain are wells of n-doped semiconductor, with a p-doped semiconductor substrate surrounding the two. In between the two is the gate, which is the metal-oxide-semiconductor portion that the MOSFET is named after.
The gate in a traditional MOSFET is rather simple. On top of the silicon substrate, a layer of silicon dioxide (SiO2) is generated, then a polysilicon or metal gate is placed on top of this SiO2 layer. This structure effectively makes the gate a capacitor, with SiO2 as the dielectric.
Semiconductor Band Bending / Bews Ohare / CC BY SA
Those that have taken some level of introductory physics will know that a capacitor generates an electric field if there is a voltage difference between the plates. While an electric field cannot go through a conductor due to the density of electrons/holes, the same is not true of a semiconductor. Once again, the Fermi level and band diagrams become important.
As a result of the electric field, the silicon of the body near the silicon dioxide has a higher chance of electrons with sufficient energy to become delocalized. This can be seen as band bending in the band diagram, where the conduction band stretches down past the Fermi level. Because the Fermi level is the total chemical potential of electrons, this means that it is much more likely for electrons to jump the band gap between the valence band and the conduction band. However, this comes at a cost.
As the density of electrons increases, the field effect becomes weaker and weaker. Thus, the band bending decreases until the effect is nonexistent. The point where this happens marks the end of the channel created. As a result of the concentration of free electrons, there is an inversion layer created. The reason why it’s an inversion layer is because in the case of this n-type MOSFET (or NMOS), the p-type body/substrate becomes an n-type substrate within the inversion layer.
MOSFET Functioning / Olivier Deleage & Peter Scott / CC BY SA
As a result of the positive voltage on the gate, the inversion layer becomes a “channel” of sorts, which allows current to flow between source and drain. That’s how an NMOS works to control the flow of current from drain to source using a gate. Since the above photo also shows the regions that a transistor can operate in, it's important to understand that the linear operating mode is where the output current of the transistor is directly proportional with voltage. In the saturation mode, this is no longer true and diminishing returns are seen in current despite increasing voltage until the transistor reaches the maximum possible current. In a CMOS camera sensor this makes it difficult to accurately recover overexposed photos as this saturation mode causes the signal to be clipped, which is why underexposed photos are generally preferable to overexposed ones.
Now let’s look at a PMOS. The same is true, merely reversed. By applying a low voltage to the gate, an electric field pushes away electrons, forming a channel of holes that allows current to flow from source to drain. In the interest of avoiding confusion, we’ll avoid discussion depletion-mode MOSFETs for now although it may be worth revisiting at a later point. This is also because depletion-mode FETs effectively don’t exist in today’s processes.
Inversion With Source-Body Bias / Brews Ohare / CC BY SA
Of course, I mentioned the body effect earlier on but never quite explained what it was. Now that we understand how the system works at a basic level, we can add a bit of complexity in the form of biasing on the body terminal. We talked about how applying a voltage to gate would cause the band structure to change. In the case of applying voltage to the body terminal, the same band-bending occurs and alters the threshold voltage on the gate needed to enter the linear region of transistor operation.
So now that we understand (maybe) how transistors work, the real question is how to implement logic using these gates. After all, it’s not immediately clear how controlling the flow of current translates into the sheer amount of possible instructions seen in code. While there are multiple methods to do this, this article will only cover the most popular method. This is known as complementary metal-oxide-semiconductor, or CMOS. The reason why this method of implementing logic is so popular is because of its power characteristics. While other methods have significant amounts of current draw regardless of state, CMOS only requires a significant amount of power while switching. We’ll go over why this is later.
Before I go over CMOS, I'll do a quick introduction to Boolean logic for those that are unfamiliar with the subject. In short, it's possible to take apart almost any statement and turn it into a series of logical operations. These operations are AND, OR, and NOT. While there are more operations than those three, every possible logical operation is possible through the combination of the three. This makes it possible to do math, store input and output, and all the other things we see in computing devices today. There's definitely much more to it, although that's best left for another day.
Now that we've done an extremely basic introduction of Boolean logic, CMOS is rather simple in its rules for implementing such logic. Through these rules, it’s possible to implement every possible logic gate, and we’ll go over the simplest example for CMOS, the NOT gate. CMOS is purely composed of p-type and n-type MOSFETs, with no need for resistors that would generate waste heat. There are only two rules that must be followed to be electrically considered a CMOS circuit:
1. All PMOS transistors must either have an input from the voltage source or another PMOS transistors.2. All NMOS transistors must either have an input from ground or another NMOS transistor.
Using these two rules it is possible to build all the other gates. For example, the NOT gate simply requires one NMOS and one PMOS. The PMOS is connected to the voltage source and the NMOS to the ground. The gate for both is controlled by a single input, and the output current is also connected together. An example of this circuit diagram can be seen below. I've also included a link for a java applet that simulates this circuit here.
NOT Gate / OpenStax CNX
If the input voltage is high, the NMOS turns on and the PMOS is off. The result is that the output wire is pulled to the ground voltage, which is 0V. This would be measured as a low voltage. If the input voltage on the gate is low, then the PMOS will switch on. The result is that the output wire has a voltage close to Vdd, which is relatively high. If the truth table is written out, we can see that this matches exactly with the truth table for the NOT operation.
Of course, there are multiple other operators. While it might be worth going over for those interested in majoring in electrical or computer engineering, to keep things (relatively) simple we’ll avoid talking through how those circuits work.
Through billions of transistors arranged in the complementary fashion just described, entire CPUs are made. Of course, these aren’t individual pieces. These billions of transistors are on a single package no bigger than the size of a fingernail. For reference, A5X was one of the largest mobile SoCs ever shipped, and its die area is 163 square millimeters. Apple’s A6 SoC is only around 97 square millimeters, or smaller than a square with side length of a centimeter. The question now is how to squeeze all of these transistors into such a small area. To answer that, we must look at the manufacturing process.
To make a computer chip, it all starts with the Czochralski process. The first step of this process is to take extremely pure silicon and melt it in a crucible that is often made of quartz. Doping material can also be added at this stage, to change the properties of the final crystal. Once this is done, a single seed crystal is dipped into the molten silicon, then carefully pulled up with a specific rotation rate. This produces a piece of monocrystalline silicon that is then sliced into wafers. These wafers can be up to 300mm in diameter at present and around .75mm thick, and they are polished to ensure that the surface is as regular and flat as possible.
Photolithography Etching Process / Cmglee / CC BY SAScanning Stepper Middle Exposure / Everyguy
Once this is done, the wafer is prepared for photolithography. An oxide layer on top of the silicon wafer is grown, and then the entire wafer is cleaned to remove contaminants. Once this is done, an adhesion promoter is applied to ensure that the photoresist will stick properly to the wafer. The photoresist is then applied by dispensing a solution of photoresist on to the wafer. The wafer is then spun at extremely high speeds for around half a minute to a minute. Once this is done, the wafer is then baked on a hot plate to get rid of the remaining solvent. In preparation for the exposure, a reticle/photomask for one layer of the process is loaded, and aligned with the wafer. In order to increase resolution, an exposure slit is used to optimize for a smaller exposure area on the reticle/projection lens system, and aberration is reduced.
Once all of this preparation is done, the exposure process begins. Intense UV light (currently 193nm) is used to change the exposed photoresist to allow the developer to strip away the exposed area. As a quick aside, the fact that UV light is used to develop the regions to etch away means that only long wavelength light can be used in clean rooms, which gives the clean room a characteristic yellow lighting. Once this is done, the wafer is baked again. This process is done again in order to properly develop the photoresist.
Once the wafer is ready, developer is added. This strips away the photoresist from the exposed regions. The exposed oxide is then etched away. While this process can be done with a liquid agent, modern dry-etch processes ionize a gas in vacuum using an RF cavity that is then shot at the exposed oxide to avoid etching past the exposed portion of the oxide. Once this etching process is complete, the photoresist is removed either through plasma ashing or by washing it off with a resist stripper.
To summarize everything I just said, the process is effectively cleaning the wafer, applying photoresist, exposing the photoresist, developing the photoresist, etching the exposed oxide, then removing the remaining photoresist.
CMOS Fabrication Process / Cmglee / CC BY SA
A modern wafer will undergo this process around 50 times or so before creating the final finished chip. You might want to know how all of this etching actually creates transistors, so we’ll once again go over the simplest case, the CMOS inverter. The first lithography pass is used to mark out the area so that we can deposit a well of n-doped silicon that the PMOS will use. Then, the oxide is grown again and a layer of polysilicon is deposited.
Another lithography pass is done to etch away parts of the oxide, then most of the polysilicon. This leaves a small piece in the center of the exposed substrate composed of silicon dioxide, then polysilicon. If this sounds familiar, it’s because this is the structure of the gate. Once this is done, ion implantation is used to create the sources and drains. The best description I can give of ion implantation is taking an ion and accelerating it to high speeds to embed itself into the targeted area, which dopes the substrate. Once this is done, a layer of nitride is added to prevent further oxide growth, which is then etched again.
Yet we’re still not done with how the chip is made. We just finished going over what happens in front-end-of-line (FEOL) processing. Now it’s time to go over what happens at back-end-of-line (BEOL) processing. Once the nitride layer is finished, a layer of metal is deposited over the entire system. This layer is then etched again to finish the transistor fabrication process. The result is that all the correct components for source, drain, gate, and body are implanted with metal connectors for input and output for our hypothetical CMOS inverter.
In a real chip, as many as 12 layers are added in this process, which means repeating the metal deposition step 12 times. This step is where all of the transistors are wired together, along with interlayer connections (vias), capacitors (in DRAM), dielectric isolation, and chip to package connectors. Once BEOL processing is complete, the chip is packaged and ready to be used.
CMOS Chip Structure / Cepheiden / CC BY SA
Of course, this entire production process isn’t perfect. Along the way, the wafer is tested multiple times to ensure that there are no defects from a previous step. If there are too many defects on a wafer, the entire wafer must be thrown away to avoid wasting time and money on further processing. After the FEOL processing is complete, the chip is tested and binned using a wafer prober. After the entire chip is packaged, the chip is tested again to ensure that the entire package is fully functional. The packaging and final testing stages are also known as the back end of chip fabrication.
To review everything we’ve just gone over, we started with the physics of semiconductors. Then we moved on to the physics of transistors. After that, we went over how to make logic with these transistors. Finally, we went over how to actually make transistors with logic. This would be a good place to stop, but complacency is a terrible reason to do so.
The question now is how to make things faster with less power. To do this, we have to figure out how to make the feature size smaller, in order to pack more transistors closer together. To put things in perspective, 43 years ago in 1971 with the Intel 4004 we had a feature size of 10,000 nanometers. That’s around 455 times as large as the 22nm feature size of what we see in Intel’s Haswell CPUs. Now it’s time to find out how this was achieved.
While many other technologies are tangentially related to shrinking feature size, one of the primary ways within the past four decades or so has been through smaller wavelengths of light. Shorter wavelengths mean higher resolution, much like how the electron microscope’s shorter de Broglie wavelength increased resolution over light microscopes. Thus, photolithography can increase in resolution by using light sources that generate shorter wavelengths.
Lithography Wavelength vs Resolution / Guiding light / CC BY SA
This progression has happened steadily over the years, starting with mercury lamps that produced UV light of around 400 nanometers. Once this was no longer sufficient, lasers became necessary in order to drive higher resolution and rate of production. The first was the krypton fluoride laser that generated a wavelength of 248nm, then argon fluoride to generate 193nm. Unfortunately, this is near the limit of what can be realistically used in an environment that contains air, as even 193nm is attenuated significantly. As seen in the photo below, in order to go lower for EUV and similar wavelengths, the lithography process must be done in a vacuum as otherwise air will absorb almost all of the energy emitted.
Photon Energy vs. Resolution / Guiding light / CC BY SA
Immersion Lithography / Renesas
So the inevitable question is what could be done next. While different foundries adopt technologies at different times, one way to push resolution further is immersion lithography, which was done around the 65nm node to 32nm node. This is relatively simple, as what this effectively does is increase the numerical aperture of the optical system because the light from the source can be refracted better than before. This is done by immersing the wafer and projection lens in extremely pure water.
Of course, this is far from a simple task in practice. The deionized water must also have no gases present that could cause air bubbles between the lens and wafer, with extremely consistent temperature and pressure. Otherwise, the actual index of refraction in the water will change unexpectedly and cause defects in the lithography process. The 193nm light used in current lithography processes can also ionize the water and in turn cause reactions with the photoresist.
Unfortunately, with processes like EUV lithography it’s no longer possible to use this method to drive higher resolution and smaller feature size because the liquid used will generally absorb all of the energy emitted. However, it’s still possible to drive higher resolution with 193i technology by using fluids with a higher index of refraction, which is an area of exploration for further resolution increases.
Multiple Patterning / SPIE
Another technique that can be used with EUV lithography is multiple patterning. While it’s “one” technique, the ways to implement multiple patterning are numerous. All have the same goal though. In essence, if a theoretical system can only provide sufficient resolution to draw two lines 64nm apart, it’s possible to double the resolution by printing another two lines 64nm apart by doing a second exposure. The result is four lines that are 32nm apart. There are a few ways to achieve this, which are known as litho-etch, litho-etch (LELE), litho-freeze, litho-etch (LFLE), and self-aligned double patterning (SADP).
LELE is rather simple in implementation and relies upon two separate photoresist layers. In the first pass, lithography is done on a hard mask, then developed. It’s important to use a hard mask, because in the second pass another layer of photoresist is applied, then exposed and developed. If there wasn’t a hard mask, the first pass would simply disappear when the second layer of photoresist is applied. LFLE is simply a modification of the LELE technique, where the hard mask is eliminated. After the initial lithography process, the photoresist is frozen by coating it with a chemical agent that is then baked and developed away, making it so that the first resist layer is separable from the second layer. Once this is done, a standard lithography pass is done to complete the process.
LELE & SOAP Patterning / SPIE
SADP is a very different way of doing things, but the end result is the same. The first step is doing a lithography/etch of dummy patterns that become the actual lines that are intended to be etched on the final pattern. Once this is done, a hard mask or similar material is deposited over the dummy patterns. After this, the hard mask is etched to expose the sidewalls that line the dummy patterns. Once all of this is completed, the dummy pattern is developed away and the exposed oxide is etched as usual. The result is that the lines are twice as close as before, and only one lithography pass was needed.
Unlike LELE and LFLE, there’s no need to be concerned about alignment because after the dummy pattern is set up, there’s no second exposure. This may be the reason why Intel's 22nm FinFET uses this process. Because the sidewall spacers are often created with hard mask materials, the resulting lines are also cleaner. This fact will become important, especially when discussing EUV and similar next generation lithography techniques.
While these techniques may sound like the perfect way to increase resolution, ultimately multiple patterning becomes increasingly expensive and difficult, especially because even a small misalignment between the two patterns can result in a wasted wafer. Multiple patterning also causes design restrictions that wouldn't occur with a true resolution increase in the lithography process because certain patterns become impossible with even order or odd order patterning processes.
Phase Shifting Mask / Stanley H. Chan
While we’re still talking about (relatively) low hanging fruit, I want to cover two other methods that are used to enhance resolution. The first is phase shifting masks. Rather than focusing upon wafer-level changes, this improves the reticle/mask itself. In short, this exploits the wave nature of light. There are two types of phase-shifting masks, and the first is an alternating phase-shift mask. This alters the thickness of the mask in some regions, which induces a phase-shift on the light waves that pass through it. As a result, there is interference with light from unmodified regions, which means that higher contrast can be achieved between the exposed and unexposed regions. The other type is the attenuated phase-shift mask, which only lets small amounts of light pass through that can interfere with the light coming from transparent regions.
Source Notes: Intended pattern in blue, OPC-corrected in green, final pattern in red.
Optical Proximity Correction / LithoGuy
The second is optical proximity correction, or OPC. One of the imperfections in the lithography process that we haven’t talked about until now is that what is drawn on the photomask/reticle is not translated exactly on to the photoresist. In reality, line widths vary greatly depending upon how dense the pattern around the line is; lines don’t end where they do on the mask, and the ends of lines are much thicker than the middle. OPC compensates for all of these effects and computes the photomask needed to achieve a layout close to the intended design.
Summarizing things again, there are multiple techniques used to increase resolution to fabricate smaller transistors. By using lower wavelengths of light, multiple patterning techniques, and computational lithography techniques like phase shifting masks and optical proximity correction, we’ve managed to make it all the way down to 22 nanometer feature sizes. With the launch of Intel's Core M, we’ve made another jump down to 14 nanometers using the same light source that we did at 90 nanometers.
Unfortunately, it’s not as easy as simply driving resolution higher. Improving lithography is only one aspect, and as transistors get smaller and smaller previously insignificant issues become incredibly important ones. One of the biggest issues is leakage current.
I alluded to this earlier with the discussion of band structure, but electrons more closely resemble the models of quantum mechanics rather than classical mechanics. This leads to effects such as quantum tunneling, where electrons can pass through insulating layers as if there was nothing there to stop it.
While it was safe to ignore these effects at larger process nodes, the smaller we go the worse the problems get. There are five short channel effects that can be discussed, but the primary effect of interest in drain-induced barrier lowering, or DIBL. An example of the effect that DIBL has can be seen in the image below. What DIBL means is that because the channel length is so short, the voltage applied to the drain can affect the source because the drain itself acts as a capacitor due to the separation of charge.
Source
This capacitance from the drain means that the gate becomes less effective at controlling the flow of current through the gate because the drain is competing with the gate. As a result of this effect, there is a shift in the threshold voltage because the drain has made it easier for the inversion layer to be generated. When discussing threshold voltage, this generally refers to the point where current through the channel begins to increase exponentially with increases in gate voltage. In addition to this change in threshold voltage, there is a decrease in the subthreshold slope because the drain is causing a level of current to always flow through the channel. This means that more voltage has to be applied to the gate in order to generate the same increase in current flow in the channel.
It's possible to try and alleviate this issue by strongly doping the areas between the source and drain to eliminate the depletion region created. Unfortunately, another effect of this halo doping is that parasitic diode leakage increases, and due to the strong doping there is a greater scattering of electrons and lower electron mobility. In addition, because doping relies on an imperfect process, higher doping levels make the threshold voltage vary from transistor to transistor. This makes the operating voltages higher than necessary to accomodate for the variance in doping.
The end result is that performance falls off dramatically. However, there are multiple methods that can stave off these effects. These methods include straining silicon, silicon on insulator (SOI), high-k metal gate (HKMG), and FinFET.
Silicon On Insulator / Advanced Substrate News
One of the first methods used in this area was SOI, which will be familiar to those that followed this site 13 years ago. In short, instead of the standard bulk silicon substrate that we’ve been showing in all of our MOSFET diagrams, an insulating layer is added just beneath the channel. This is an advantage in some ways, but a disadvantage in others.
First, parasitic capacitance is reduced. What we haven’t talked about until now is that by virtue of the depletion zone and charge on the source or drain, there is a separation of charge between the source or drain and the body. A planar capacitor is nothing more than two metal plates with a dielectric between the two, with a charge on one of the plates, so this is a form of capacitance. The issue with this capacitance is that once you add a resistor in series with the capacitor, you have an inherent delay in current. This is a classical RC circuit problem, and this reduces the rate at which the transistor can switch, which reduces performance. This delay is because the rise and fall of current from switching the transistor on and off is slower than if there was no capacitance. By adding a thick silicon oxide layer underneath the transistor, the distance is increased and therefore the capacitance decreases. This means that the time delay gated by RC decreases, which means clock speed increases.
The other problem is leakage far away from the gate. In recent process nodes, leakage has become a bigger and bigger problem. This is due to the short channel effects that we discussed before. While gate-induced drain leakage (GIDL) current is one issue that arises from smaller process nodes, it's inherently a different problem. While solving GIDL is done with HKMG, channel leakage far from the gate has been a persistent issue in recent process nodes, and it's something that scaling equivalent oxide thickness (EOT) at the gate won't solve. This is because the area far away from the gate cannot have its voltage pulled down by the gate. This goes back to the effect of DIBL that we previously discussed.
One of the first methods used to solve this problem was SOI, and by looking at a diagram of even partially depleted SOI (PD-SOI), it's clear why this is. SOI technology in general reduces the amount of silicon far away from the gate, and this in turn means that there is much less channel leakage. For PD-SOI though, because the bulk of the silicon isn't connected to a terminal the body "floats". This leads to something called the history effect, which means that the threshold voltage will change depending upon the previous voltages applied to the gate. This can also cause parasitic transistors to activate and cause leakage.
The logical conclusion to fix these issues is fully depleted SOI, which makes the channel thin enough that the body no longer floats, as it doesn't have a region where charge can accumulate one way or another. The gate can still create its inversion layer, but when the inversion layer isn't present the barrier for current is much stronger than before because all of the silicon in the channel is very close to the gate. This means that the leakage current in general is lower.
Unfortunately, SOI technology in general comes with higher cost and due to the insulating layer that the transistors are built on thermal dissipation of the transistors isn't as effective as it is on bulk processes. While AMD used to use SOI technology, they have since transitioned to bulk processes. FD-SOI is still a viable option, but for the most part SOI technology isn't found in VLSI chips like CPUs, GPUs, and other forms of digital logic. There are certainly niche cases where SOI dominates such as analog RF and radiation-hardened applications, but it seems that the foundry model makes SOI unpopular.
While SOI seems to have lost the popularity that it once had, other methods of improving transistor performance have become much more popular. For example, by putting silicon germanium (SiGe) or silicon carbide (SiC) in the source and drain, the silicon in the channel is stretched past its normal interatomic distance and reduces the effects of electrostatic forces. In other words, we get strained silicon. This increases the mobility of charge carriers in the channel, thus increasing drive current and overall transistor performance. There’s more than one way to achieve this, but the principle is ultimately the same. This kind of technology can be seen as early as 2003 with Intel's 90nm process.
While straining silicon has been around since the 90nm node, the high-k metal gate has been critical for improving transistor performance at the 45nm node and below. When we first discussed the gate structure of a MOSFET, we talked about how a silicon dioxide layer is grown on top of a silicon substrate, which then has another polysilicon layer deposited on top of the silicon dioxide. Unfortunately, that hasn’t been the structure of the transistor gate for at least the past year for SoCs like the Snapdragon 800 and newer. HKMG has actually been around since 2007 in consumer devices with the launch of Intel’s Penryn CPUs.
The reason why HKMG is so important goes back to quantum mechanics. Those familiar with the equation for capacitance of a planar capacitor will know that reducing the distance between the two plates will increase capacitance. This means the field effect is stronger, which improves drive current and control over the channel. This control over the channel also increases the rate at which transistors switch. Unfortunately, at sufficiently small thicknesses, this all breaks down. Because electrons work probabilistically, we suddenly encounter cases where electrons begin tunneling through the insulator from the gate to the silicon channel. This causes significant leakage current through the gate to ground even when the gate isn’t being switched from one state to another, so we can no longer decrease thickness beyond a certain point.
One way to continue increasing gate capacitance while also decreasing leakage is to use a high-k dielectric, but this introduces a great deal of comlexity in the manufacturing process compared to simple chemical vapor deposition of SiO2. The gate material itself can no longer be polysilicon, as it’s close enough to the inversion layer that it starts to become a depletion region. This causes issues in channel formation, so it becomes important to use a metal gate to improve performance. In addition to the poly depletion issue, a phenomenon known as Fermi level pinning occurs between the polysilicon/high-k interface. This effect dramatically raises the threshold voltage and decreases drive current.
Unfortunately, once we get to 22 nanometers and below, short channel effects become even more significant. While we managed to decrease leakage and improve performance with HKMG, the gate still doesn’t have sufficient control over the channel because the gate keeps getting smaller relative to the substrate. The answer with the Tri-Gate transistor is to wrap the gate all around the channel that would form, dramatically reducing the amount of silicon far away from the gate. The resultant structure becomes like a fin, and due to the extremely thin channel the extent of the depletion region is determined by the physical structure rather than the applied bias to the terminals of the FET. This makes the transistor channel fully depleted as well, but this is merely an interesting side-effect, not a causative mechanism.
It's important to understand that this is why FinFET and FD-SOI are fully depleted technologies, but the key here is that the channel is now extremely thin so that the area far from the gate is eliminated. This means that the voltage and resultant capacitance of the gate should be able to overpower the effect of the drain's capacitance in all areas of the channel. The impact on performance is enormous.
As a result of this change, the off current is much lower than before as the effect of DIBL is reduced. This means that heavily doping the channel isn't necessary so variance in threshold voltages from one transistor to another is reduced. In addition, because of the larger inversion layer that can be generated and lower doping levels used in the channel, drive current is improved. This also means improved subthreshold swing can be achieved, so switching transistors on and off is even faster, which also improves performance. This is very much similar to the advantages that we see with FD-SOI, but it can be implemented on bulk silicon which reduces variable cost. FinFET and FD-SOI are just different ways of accomplishing the same goal, and share all the advantages of a thin channel.
Finally, it’s time to talk about what lies ahead. One thing we haven’t talked about yet are the pressing issues of the near future. For example, there are major issues looming with interconnect delay that haven’t been solved yet. As we discussed in BEOL processing, the current system of wiring these metal layers uses copper with a dielectric in between the wires with a tantalum/tantalum nitride cap to support the next layer. However, the smaller the wires get, the worse electron mobility becomes.
This is roughly analogous to the issues that we see in the front end of line, which saw decreasing drive current for the same reasons. Combined with the capacitance produced between wires, there is dramatically increasing RC time delay in the connections made between transistors that hurts performance due to lower peak clock speeds. Ultimately, progress can only be made on this end by using lower k dielectrics and lower resistance materials to drive down both resistance and unwanted capacitance in these circuits.
There’s still more to talk about though. While we’ve managed to stretch 193mm immersion lithography further than anyone ever imagined, 10 nanometers is likely to be the end of the road for 193nm. Here, the future is unclear. Realistically, it seems that there are only a few techniques that will be viable at the next level. These include extreme ultraviolet lithography (EUVL), nanoimprint lithography, and electron beam lithography. Let's go over each one to try and understand what challenges are coming and which option may be the best in the long run.
EUV
On the surface, EUV is a relatively straight progression of the current 193i deep-UV (DUV) technology and should deliver the significant jump needed to advance lithography beyond 10 nanometer feature sizes. While this seems easy enough, there are a large number of issues that crop up from such a short wavelength (~13nm) of light.
First, the EUV light source itself is no longer a laser in most cases. This means that the energy efficiency of the light source is extremely low compared to current 193nm excimer lasers, and as a result an enormous amount of energy has to be used to generate an incoherent light source that is then filtered to only produce EUV.
Second, EUV is strongly absorbed by almost all matter. This means that the lithography must take place in a vacuum, which rules out techniques such as immersion lithography to further resolution enhancements. By using a strong vacuum for lithography, the amount of wafers that can be processed per hour drops dramatically as vacuuum chucks can no longer be used to hold the wafer, and the electrostatic chucks used must be heated to a stable temperature with a sacrificial wafer. Also, keeping the wafer heating from the EUV exposure becomes a significant issue because the immersion fluid that once cooled the wafer cannot be used.
In addition, it's impossible to use transmissive lenses because of this same issue. As a result, mirrors must be used to focus and reflect the EUV light on to the target. There are potential issues with EUV damaging the mirrors in the optical system, and in order to reflect the EUV light all of the mirrors and the photomask have to be coated with multiple layers that use interference effects to maximize reflected light. This also means that even minor defects in a photomask can result in an unusable photomask, as seen in the photo below.
Buried Defect on EUV Mask Blank / Guiding light / CC BY SA
Unfortunately, these defects cannot be seen using an electron microscope. In order to see these defects, an EUV microscope must be used. Outside of these challenges, EUV itself intrinsically has multiple issues that reduce resolution in some unexpected ways. First, shot noise becomes a serious issue. What this means is that it's fundamentally impossible to completely control the number of photons that are released into the photoresist. This means that the lines drawn by an EUV can be unacceptably rough unless enough photons are used to ensure that shot noise is statistically insignificant, but this only increases the power requirements for EUV lithography. To make things worse, by increasing the amount of exposure to the photoresist the material can be damaged by the sheer intensity of the heating effect of EUV.
Finally, because EUV is so energetic, it is a form of ionizing radiation. This means that when EUV light is absorbed by a molecule, electrons can be liberated. Unfortunately, this adds yet another source of uncertainty and also decreases resolution because the generated photoelectrons move randomly through the photoresist. In effect, all of these issues reduce the true resolution of EUV to somewhere around the 15-19nm feature size for a single exposure. In order to even get to 10nm and below, double patterning is required to reach the resolution necessary.
In short, the resolution gain from current 193nm technology is relatively small compared with the enormous expense and new design challenges. Based on how many delays have occurred with the introduction of EUV lithography, it may mean that a radically new system is needed to fabricate even smaller ICs. That's where nanoimprint and e-beam lithography come in.
Nanoimprint Lithography
Nano Imprint / UMD
This type of lithography is incredibly simple. This method is simply using a thermoplastic polymer coated onto the substrate (such as a silicon wafer), and then a mold is pressed down on the wafer to print a pattern. Once this is done, a pattern transfer technique such as plasma etching is used to etch away the resist as necessary to expose the pattern on the wafer.
While this technique is simple and can easily be used as a next generation lithography tool, there are also a number of flaws involved with this process. If the imprinting isn't done in a vacuum, there is a high likelihood that the mold will have air bubbles that alter the pattern in unpredictable ways. In addition, the template is susceptible to wear and tear, which means that resolution is lost as the mold is repeatedly used in imprinting. The imprint mold must also take into account uneven depth of imprinting based on the density of the pattern used and potential stretching of the resist. Finally, making the template itself requires extremely precise lithography, which means that it's limited by current generation lithography techniques in resolution. This isn't a solution by itself, which means it can only be used in conjunction with other methods.
Electron Beam Lithography
Electron Beam Lithography / SEMI
Just as we saw in the progression of microscopes, the early days were mostly focused on improving light microscopy with higher NA lenses and oil immersion, but today some of the highest resolution microscopes use electrons instead of photons. Similarly, electron beam lithography can generate some of the highest resolutions possible out of all the lithography techniques we've discussed in this article. This means that sub-10nm resolution is easily achieved. In fact, it's fully possible to do away with the resist and simply write the pattern directly on the silicon wafer, eliminating the resist as a potential bottleneck for resolution.
Unfortunately, this system is extremely slow. Rather than the 100+ wafers per hour of current photolithography techniques, electron beam lithography is often limited to less than ten wafers per hour. Using a single electron beam to write an entire 300mm wafer would take around 22 years. While thousands of beams can be simultaneously writing to the wafer to speed up the process, the electrons begin to affect the trajectory of other electrons in other beams. This requires complex modeling to compensate for such effects.
As with EUV lithography, shot noise becomes a significant issue despite a much easier ability to control the dose. This is because even small variations in the number of electrons can significantly effect the roughness of the lines drawn by the electron beam. There is also a strong need to choose a balanced energy for the electron, as excessively energetic electrons can cause significant secondary electron generation, but too little energy means the electrons are easily scattered. Both reduce resolution and can result in unacceptable defects for smaller process nodes.
While it's impossible to predict what the future will hold, it's relatively easy to see what trends will take place in the near future to keep Moore's law alive. For example, instead of using dual or tri gate technologies (FinFET), it's reasonable to expect that gate-all-around (GAA) will become the next step in the evolution of transistor shapes. However, it's currently not clear when this would reach mass production, if ever.
Currently, experimental GAAFETs have only existed for around eight years. For reference, FinFETs were first made in 1999. It took around a decade and a half for any such 3D transistor to reach mass production. By continuing to scale to higher k dielectrics for the gate, lower k dielectrics between interconnects, lower resistance metals for interconnects, and even better strain engineering, we will continue to see the scaling of CMOS technology.
CNFET / Joerg Appenzeller
Unfortunately, all of these can only go so far. Fundamentally, there will be a point where silicon-based transistors cannot scale any further. Gate oxides, channel lengths, and other critical dimensions can only shrink so much before either resistance is too high or a myriad of other effects render smaller sizes infeasible.
The next step is almost impossible to predict. Perhaps graphene will take the place of silicon, but graphene currently is impossible to mass-produce and is a semi-metal, which means it lacks the band gap necessary for a semiconductor. While it's been shown that semi-metal transistor logic is possible, it's currently in the very early stages and Boolean logic may be impossible with graphene. Phosphorene has promise as a semiconductor replacement for silicon, but it's similarly impossible to mass produce. Phosphorene-based FETs are still in the exploratory stages, with no actual transistor created yet.
TFET Lateral Structure / Jteherani / CC BY SA
Outside of material changes, the working mechanism of the transistor itself may change. One promising candidate right now is the tunnel field effect transistor, which relies on band to band tunneling rather than the traditional inversion layer generation for current flow. This is similar to leakage that occurs from halo doping, which results from the conduction band of the channel material aligning with the valence band of the source or drain material. As seen by the photo above, this type of transistor has an undoped body and the source/drain are of opposing types. The gate structure is unchanged from previous MOSFETs. In practice, such a transistor structure has a much higher rate of current increase per unit of voltage.
It's been a long road, but let's quickly go over the topics covered in this article. We started with a description of semiconductor physics, then moved to the basics of MOSFETs and CMOS. Once we understood how MOSFETs work in CMOS to create logic, we moved on to the actual fabrication process of these transistors in a chip.
After all of this, we discussed how companies have increased the resolution of the fabrication process to make ever smaller transistors, and we continued by looking at how companies have increased transistor performance despite significant engineering challenges. Then we briefly covered what the future may hold for improving device performance and continuing to improve the lithography process to continue making smaller transistors.
But there is far more to be done, as literally everything we write about at AnandTech depends upon ever faster, smaller, and more efficient transistors packed as tightly as possible. Without this continued innovation, the PC, smartphone, and wearables that we see today would be impossible to make. However, continuing the scaling that we have seen within the past decades will require more ingenuity and resources than ever before to continue pushing the limits of what's possible.
Normally, we would end things here, but this time I'd like to end by thanking everyone that has helped make this article possible. It has taken weeks of research and asking questions to get to this point, and I'm sure that without help it would have taken months. Out of the many that have helped, I'd like to specifically thank Chenming Hu, a professor in the graduate school at UC Berkeley and the lead researcher in FinFET and UTB-SOI/FD-SOI, for taking the time to help clarify the reasons for SOI and FinFET. I'd also like to thank Gerd Grau, a doctoral candidate in the graduate school at UC Berkeley, and Intel's TMG for answering all kinds of questions about solid state physics in general.