Cyberparse.co.uk

AMD Launches Carrizo: The Laptop Leap of Efficiency and Architecture Updates

2015-06-03

Perform a small test for me. Close your eyes, and spend 15 seconds considering the state of the laptop market and what devices interest you, are available, or on the horizon. Done? Let me hazard a guess – Apple’s offerings loomed large over $800, with $1500+ gaming laptops on the periphery. At $300 we’re more in tablet-first space with a mix of cheap clamshell rubbish. In the middle is an assortment of $400-$700 good but not always great mixture of 2-in-1s (like the Surface) or clamshells (like the ASUS UX305), divided mostly on price and features but 95% of them contain Intel. Today’s launch of Carrizo by AMD is hoping to change that perception, particularly in $400-$700 and at 15W.

Previously when we have mentioned Carrizo, such as when AMD opened the lid on a few important aspects earlier this year, all thoughts pointed towards the new core under the hood, Excavator, and if there would be any desktop presence to speak of. As we recently discussed in the latest desktop APU release from AMD, Excavator is purely a laptop play, and today’s release of Carrizo explains a lot about why that is the case.

As with any major processor release, the usual remits ring out in the marketing: better performance, lower power, ‘we want to be the best’ and so on. A large part of AMD’s product launches require understanding of why they release what they do. But as I mentioned at the top of the page, AMD’s problem in the laptop space is a result of their competitor taking most of the market, particularly in halo products, big name contracts (Apple) and what you actually see on the shelves. AMD’s purpose with the launch of Carrizo, as one would expect, is to change that around.

AMD’s argument is that the notebook/laptop segment accounts for 52% of the revenue in the consumer computing space excluding tablets, with the biggest market of that being $400-$700. As a result, Carrizo was built to be poised to bring competition to this market, and potentially provide premium level performance and a more palatable price range. That being said, in my own opinion, the slide provided by AMD is rather telling. The PC industry is severely fragmented and there is no one single product segment that stands out than others. Even within notebooks, the actual market ranges listed above are about equal, comparing 4-in-10 for the middle segment against 3-in-10 for the others a piece. To borrow analogy from MediaTek, the ‘Super-Mid’ category where 80% of sales are in the mid-range price for smartphones just doesn’t exist in notebooks. It also means it comes across as quite difficult to produce a single product that scales across that large range, and we end up with extremely focused product launches like Carrizo today. That’s not necessarily a bad thing, as it means the users in that segment end up with that focused, optimized unit.

Carrizo is one of three AMD releases this year, with one being the recent launch of Kaveri Refresh desktop APUs (codenamed ‘Godaveri), and the other is Carrizo-L. Carrizo-L is not launched yet, but we know that it is designed to be a counterpart to Carrizo in similar power segments but using AMD’s latest ‘Cat’ core designs which offer a cheaper alternative but are designed to be almost plug-and-play with Carrizo as they both have the same socket. We will have more information on Carrizo-L later in the year.

Carrizo’s goals are simple, and should apply to A8, A10 and FX (yes, mobile FX) processors. The big things here are the price, keeping in the $400 to $700 as the mainstay, and the ‘all day unplugged’ performance. Notice the asterisk there, and the small text stating ‘all day defined as >8 hrs idle battery life’. I’m sure a few of our readers snickered at that a little bit, as did I the first time I read it, but let’s be serious for a moment – this is the position AMD is coming from, and perhaps it is a little concerning that the world’s second largest consumer x86 processor manufacturer can’t get an elbow into the laptop space due to how quickly the user experience method of evaluating things, such as how long a charge lasts, compared to the iterative cycle of processor updates. That speaks volumes regarding strategy and targeting, both of which have been key points in discussing AMD so far this decade. But this slide indicates that AMD is pursuing the path that indicating what bare minimum requirements should be. An issue with that is down to the OEMs though – a good processor in a bad design is a bad product after all.

Rather than write about markets and concepts for this entire article, I want to go through some of the architecture changes AMD has bought with Carrizo, but first we come to high level performance. We should note that the numbers here are all solely provided by AMD, as we have not had a chance to test the systems ourselves yet. We have some exciting plans in the works to cover the performance in depth in due course, so stay tuned for that, but first the above slide indicates some of what AMD hopes to provide with Carrizo.

Carrizo will be a 15W focused part, with the A10 and FX models having an additional 35W mode although this will be at the discretion of the OEM. This makes life a little difficult, as the name of the processor is no-longer the clarification of performance and I’d hazard a guess and say that OEMs will not put ‘15W’ or ‘35W’ in front of their naming regimes. Nevertheless, AMD’s provided results comparing their 15W APUs to some of Intel's 15W CPUs, and in those results the 15W APUs compare favorably. The Intel parts are HD 5500 graphics, with 24 execution units (the i3 has 23). It is worth noting that Intel also offers 15W SKUs with the higher-frequency 24EU HD 6000, so the HD 5500 AMD is comparing themselves against is not Intel's most powerful iGPU. That said (and in all fairness to AMD) this isn't AMD's doing but rather the OEMs'; since none of the OEMs have shipped any 15W HD 6000-equipped products, AMD has not been able to look at HD 6000's performance. Similarly, Intel offers a 28W Iris 6100 range of processors, but these too are not in products that AMD could test. I would imagine that the team that wins when those come out would be first to publish benchmark results. But overall the 15W target is AMD’s main focus here.

At AMD’s Tech Day a couple of weeks ago, they did have a few Carrizo laptops on display. There were fairly non-descript clamshells made by an OEM but with the latest spin of the APU inside. All but one of the units were running on integrated graphics, but the other also held the R7 M365 GPU which could be used in dual graphics mode with the 8000-series APU to give the R7 M370DX (D for dual). Unfortunately we weren’t allowed to perform hands on testing at the time, and the slide above doesn’t match up the percentage increase for DIRT, but Starcraft’s numbers at least show a bump over 30 FPS when moving into dual graphics mode. Dual graphics will become more important when DX12 comes along, as Carrizo will support Asymmetric Rendering, allowing each GPU in the system to be a render target depending on the strength of the GPU, rather than the cobbled together Crossfire way we do things now.

One of Carrizo’s strengths lies in the video decoder and the path for which video takes through the SoC. By minimizing data transfers, increasing the bandwidth for the unified video decoder and the onboard HEVC decode IP that makes it the only x86 SoC with a full HEVC code pipeline, battery life is increased substantially, according to AMD.

We go into more detail on the next pages when discussing the architecture updates, but AMD is stating that their improvements to the whole video pipeline from loading into memory and playback will aid both battery life and the experience. For 10Mbps 1080p HEVC content, AMD is stating 300 minutes of unplugged playback time on Carrizo compared to 112 minutes with Kaveri. It is worth stating that the systems used a 13x7 eDP panel using PowerDVD14, with the system power of the 15W TDP Carrizo-based system at 10.02 W (12.6% CPU utilization), giving 299.4 minutes on a 50 Wh battery, using DDR3-1600 memory. The Kaveri system, also 15W TDP based using DDR3-2133, used 32% CPU utilization averaging 26.72W for 112 minutes on the same battery. That’s a big step, and I would be interested to see if the memory made a difference there.

HEVC still is not the normal way of delivery video yet as H264 still remains king. AMD performed the same test using the same equipment above using the H264 clip of Big Buck Bunny to arrive at 8.3 hours playback for Carrizo against 3.3 hours with Kaveri. This is down to a number of things, but the UVD now has 4x the bandwidth, allowing 1080p frames to be decoded in 1/4 of the time, forcing a race to sleep.

Carrizo is designed to have full HSA 1.0 compliance, assuming the standard doesn’t change between now and users getting devices into their hands. As with the 'HSA ready' Kaveri APUs, the weight of potential with OpenCL to use the heterogeneous system architecture is something that AMD needs to exploit in order to promote more positive experiences. Needless to say, AMD has been doing this.

One feature AMD presented is a new piece of software called Looking Glass. This uses HSA enabled through OpenCL to essentially perform video tagging for faces. It will recognize common people in your videos, and tell you where in those videos those people are. One example of this is to keep track of home videos, most of which usually end up un-named, and allows users to have a quick reference to what videos John or Jill are in, then collates them to be used in other programs such as video editing.

This has archival potential as well, and although personally I am not much of a video taker I do see the merits in such a system and can understand that processing through video frames, running facial recognition algorithms on each is both a compute and memory intensive process that something such as HSA can aid.

Windows 10 acceleration is also in AMD’s plans, particularly revolving around the HEVC decode pathway in the APU. Anything that involves video playback or streaming is ripe for this, although I would imagine the major players having both feet in this concrete well before it sets down to the mainstream.

Carrizo also comes with security though a built in Cortex-A5 processor into the die. This divides out memory and processing to allow a complete secure ARM TrustZone element separate from the main CPU cores and memory. AMD’s focus on this, quite understandably, is business.

At the other end of the spectrum, for consumers, AMD is bundling codes for Batman 3 or DIRT Rally with their APUs to OEMs, although I would imagine it is up to the OEM to pass this on as part of their product.

Despite today being the official launch of Carrizo, AMD has been tight lipped on exactly what SKUs will be in the market, how the segregation of products plays out, or even when OEMs are expected to bring them to market. At this point, Carrizo is more or less a paper launch showing off capabilities but nothing concrete we can yet play with in our hands. That being said, this week is Computex and I imagine some OEMs will be showing off some models (either on the show floor or behind closed doors).

AMD’s biggest problem is still what I mentioned at the start – visibility. There are forums where users discuss where to find laptops with particular versions of AMD’s APUs, or what might be available in what markets. When you’re in a situation where users are struggling to find stock of your product, having to ask others to ship overseas or having to pre-order with retailers and befriend someone who works for that business, then something is up. Personally, I believe it comes down to the OEM perception of AMD in the laptop space. They are still seen as the budget play, prone to temperature issues and unresponsive for all but basic tasks. Sometimes it can be hard to break that perception, even with an aggressive marketing campaign or even providing samples to OEMs to test themselves – the competition is a known source. AMD needs a stable supply of good products in order to inject some knowledge into the ecosystem, and nothing comes better than a design win, but there is no obvious pursuant at this point. Reviewers have the same issues when it comes to sampling. OEMs and companies want us to test their high end halos, to experience a trickle-down effect, rather than a mid-range product. A bad review of a high-end product could cost some sales, but a bad review of something with more volume could mean a revenue reduction noticeable on the balance sheets. This is something we are hoping to change around, particularly if AMD’s claims about Carrizo hold water.

As part of AMD’s Carrizo launch, several select media were invited to a specialist Technology Day a couple of weeks beforehand for an architecture deep-dive with Joe Macri, CVP Product CTO and Sam Neffziger, an AMD Corporate Fellow, about Carrizo. I want to cover what they explained with a few thoughts on the next few pages.

From a design perspective, Carrizo is the biggest departure to AMD’s APU line since the introduction of Bulldozer cores. While the underlying principle of two INT pipes and a shared FP pipe between dual schedulers is still present, the fundamental design behind the cores, the caches and the libraries have all changed. Part of this was covered at ISSCC, which we will also revisit here.

On a high level, Carrizo will be made at the 28nm node using a similar silicon tapered metal stack more akin to a GPU design rather than a CPU design. The new FP4 package will be used, but this will be shared with Carrizo-L, the new but currently unreleased lower-powered ‘Cat’ core based platform that will play in similar markets for lower cost systems. The two FP4 models are designed to be almost plug-and-play, simplifying designs for OEMs. All Carrizo APUs currently have four Excavator cores, more commonly referred to as a dual module design, and as a result the overall design will have 2MB of L2 cache.

Each Carrizo APU will feature AMD’s Graphics Core Next 1.2 architecture, listed above as 3rd Gen GCN, with up to 512 streaming processors in the top end design. Memory will still be dual channel, but at DDR3-2133. As noted in the previous slides where AMD tested on DDR3-1600, probing the memory power draw and seeing what OEMs decide to use an important aspect we wish to test. In terms of compute, AMD states that Carrizo is designed to meet the full HSA 1.0 specification as was released earlier this year. Barring any significant deviations in the specification, AMD expects Carrizo to be certified when the final version is ratified.

Carrizo integrates the southbridge/IO hub into the silicon design of the die itself, rather than a separate on package design. This brings the southbridge down from 40nm+ to 28nm, saving power and reducing long distance wires between the processor and the IO hub. This also allows the CPU to control the voltage and frequency of the southbridge more than before, offering further potential power saving improvements. Carrizo will also support three displays, allowing for potentially interesting combinations when it comes to more office oriented products and docks. TrueAudio is also present, although the number of titles that support it is few and the quality of both audio codecs and laptop speakers leaves a lot to be desired. Hopefully we will see the TrueAudio DSP opened up in an SDK at some point, allowing more than just specific developers to work with it.

External graphics is supported by a PCIe 3.0 x8 interface, and the system relies on three main rails for voltage across the SoC which allows for separate voltage binning of each of the parts. AMD’s Secure Processor, with cryptography acceleration, secure boot and BitLocker support are all in the mix.

AMD’s take home message in all of this is efficiency. We a being quoted a performance per watt increase of 2.4x, coming from typical power draw savings of 2x and performance increase of almost 1.5x for 23% less die area, all in one go.

Ultimately this all helps AMD’s plan to be 25x more efficient with their APUs by 2020, and the cumulative bar chart on the right is how mobile improvements from all sides are being realized. Migrating the southbridge on die severely reduces its idle power consumption to almost zero and can help efficiencies elsewhere in the system. The APU general use and memory controllers are the next targets, but the common constant here is the display. Using a low power display might give battery life in exchange for quality, and there is only so much power you can save at the SoC level. In time, the display will be the main focus of power saving for these devices.

A big part of the reduction in die area comes from the set of high density libraries being used by AMD. Above were three examples provided where >33% gains were made in silicon area. Typically using a high density library design is a double edged sword – it reduces die area and potentially leaves more area for other things, but the caveat is that it may be more prone to defects in construction, require additional latency or have a different frequency/voltage profile. AMD assures us that these changes are at least like-for-like but most of them contain other improvements as well.

It’s worth noting here that AMD has described the high density library project internally as the equivalent of a moonshot, essentially the developers were part of a ‘skunkworks’ division attempting to make drastic changes in order to improve performance. The high density library is one such successful project from that.

With the new libraries, comparing Excavator to Steamroller shows the effect moving designs has. The power/frequency curve below 20W per module shifts to higher frequency/lower power, whereas losses are observed above 20W. However for 15W per module, this means either a 10%+ power reduction at the same frequency or a 5% increase in frequency for the same power. Should AMD release dual thread / single core APUs in the 7.5W region, this is where most of the gains are (as noted in the comments, the dual module designs are at 7.5W per module, meaning that what we should see in devices is already in the peak value for gains and benefits such as 25% frequency or 33% power). As also seen in the insert, the silicon stack has been adjusted to a more general purpose orientation. I could comment that this makes the CPU and GPU work better together, but I have no way of verifying this. AMD states the change in the silicon stack makes production slightly easier but also helps with achieving the higher density Excavator exhibits.

One of the biggest changes in the design is the increase in the L1 data cache, doubling its size from 64 KB to 128 KB while keeping the same efficiency. This is combined with a better prefetch pipeline and branch prediction to reduce the level of cache misses in the design. The L1 data cache is also now an 8-way associative design, but with the better branch prediction when needed it will only activate the one segment required and when possible power down the rest. This includes removing extra data from 64-bit word constructions. This reduces power consumption by up to 2x, along with better clock gating and minor adjustments. It is worth pointing out that doubling the L1 cache is not always easy – it needs to be close to the branch predictors and prefetch buffers in order to be effective, but it also requires space. By using the high density libraries this was achieved, as well as prioritizing lower level cache. Another element is the latency, which normally has to be increased when a cache increases in size, although AMD did not elaborate into how this was performed.

As listed above, the branch prediction benefits come about through a 50% increase in the BTB size. This allows the buffer to store more historic records of previous interactions, increasing the likelihood of a prefetch if similar work is in motion. If this requires floating point data, the FP port can initiate a quicker flush required to loop data back into the next command. Support for new instructions is not new, though AVX2 is something a number of high end software packages will be interested in using in the future.

These changes, according to AMD, relate to a 4-15% higher IPC for Excavator in Carrizo compared to Steamroller in Kaveri. This is perhaps a little more what we normally would expect from a generational increase (4-8% is more normal), but AMD likes to stress that this comes in addition to lower power consumption and with a reduced die area. As a result, at the same power Carrizo can have both an IPC advantage and a frequency advantage.

As a result, AMD states that for the same power, Cinebench single threaded results will go up 40% and multithreaded results up 55%. The benefits are fewer however the further up the power band you go despite the increase, as the higher density libraries perform slightly worse at higher power than Kaveri.

When it comes to power, Carrizo features two/three technologies worth discussing. The first is the use of low power states, and the different frequency domains within the SoC. Previous designs had relatively few power planes, which left not as many chances for the SoC to power down areas not in use. Carrizo has ten power planes that can be controlled at run-time, allowing for what can be described as a dynamic race to sleep. This is bundled with access to the S0i3 power state, giving sub 50mW SoC power draw when in sleep and wake-up times under a second.

This is also combined with automated voltage/frequency sensors, of which an Excavator core has 10 each. These sensors take into consideration the instructions being processed, the temperature of the SoC, the quality of power delivery as well as the voltage and frequency at that point in order to relay information about how the system should adjust for the optimal power or performance point.

AMD states that this gives them the ability to adjust the frequency/power curve on a per-module basis further again to the right, providing another reduction in power or increase in frequency as required.

Next up for discussion is the voltage adaptive operation that was introduced back in Kaveri. I want to mention it here again because when it was first announced, I thought I understood it at a sufficient level in order to write about it. Well, having crossed another explanation of the feature by David Kanter, the reason for doing so clicked. I’m not going to steal his thunder, but I suggest you read his coverage to find out in more detail, but the concept is this:

When a processor does work, it draws power. The system has to be in a position to provide that power, and the system acts to restabilize the power while the processor is performing work. The work being done will cause the voltage across the processor to drop, to what we classically call Voltage Droop. As long as the droop does not cause the system to go below the minimum voltage required for operation, all is good. Voltage Droop works if the supply of power is consistent, although that cannot always be guaranteed – the CPU manufacturer does not have control over the quality of the motherboard, the power supply or the power conversion at hand. This causes a ripple in the quality of the power, and the CPU has to be able to cope with these ripples as these ripples, combined with a processor doing work, could cause the voltage to drop below the threshold.

The easiest way to cope is to put the voltage of the processor naturally higher, so it can withstand a bigger drop. This doesn’t work well in mobile, as more voltage results in a bigger power draw and a worse experience. There are other potential solutions which Kanter outlines in his piece.

AMD has tackled the problem is to get the processor to respond directly. When the voltage drops below a threshold value, the system will reduce the frequency and the voltage of the processor by around 5%, causing the work being done to slow down and not drain as much. At AMD’s Tech Day, they said this happens in as quickly as 3 cycles from detection, or in under a nanosecond. When the voltage drop is normalized (i.e. the power delivery is a more tolerable level), the frequency is cranked back up and work can continue at a normal rate.

Obviously the level of the threshold and the frequency drop will determine how much time is spent in this lower frequency state. We were told that with the settings used in Carrizo, the CPU hits this state less than 1% of the time, but it accounts for a sizeable chunk of overall average power reduction for a 3.5 GHz processor. This may sound odd, but it can make sense when you consider that the top 5% of the frequency is actually the most costly in terms of power than any other 5%. By removing that 5% extreme power draw, for a minimal performance loss (5% frequency loss for sub 1% of the time), it saves enough power to be worthwhile.

A typical consumer user experience revolves a lot around video, and AMD identified for Carrizo a big potential to decrease power consumption and increase performance in a couple of different ways. First up is adjusting the path by which data is moved around the system, particularly as not a lot of video matches up with the native resolution of the screen or is scaled 1:1.

When a video exhibits a form of scaling, either it is made full screen and scaled up or it is a higher resolution video that scales down, that scaling is typically performed by the GPU. The data leaves the decoder (either hardware or software), enters system memory, moves into the graphics memory, is processed by the GPU, moves back out to memory, and then is transferred to the display. This requires multiple read/write commands to memory, requires the GPU to be active but underutilized, and this happens for every frame. AMD’s solution to this is to provide some simple scaling IP in the display engine itself, allowing for scaled video to go from the decoder to the display engine, leaving the GPU in a low power state.

The video playback paths at the bottom of this side show the explanation graphically, and AMD is quoting a 4.8W down to 1.9W movement in power consumption for these tasks. Note that the 4.8W value is for Kaveri, so there are other enhancements in there is as well, but the overall picture is a positive one and AMD quotes a 500mW of APU power savings.

The Unified Video Decoder (UVD) has been built to support the above codecs, with HEVC decode on die as well as native 4K H.264 decode as well. I’ll come back to the 4K element in a second, but what is perhaps missing from this list is VP9, the codec used by Google for YouTube. YouTube is still the number one source for video content on the web, and as Google is transitioning more to VP9, as well as AMD’s competition advertising it as a perk on their latest hardware, it was perhaps confusing for AMD to miss it out. I did ask on this, and was told that they picked HEVC over VP9 as they believe it will be the more important codec going forward, particularly when you consider that the majority of the popular streaming services (NetFlix, Hulu, Amazon) will be using HEVC for their high definition titles.

Back onto the 4K equation, and this is possible because AMD has increased the decode bandwidth of the UVD from 1080p to 4K. This affords two opportunities – 4K video on the fly, or 1080p video decoded in a quarter of the time, allowing the race to sleep for both the UVD and DRAM. Despite a 75% reduction in work, as the UVD does not use that much power, it results in only 30 minutes of extra video playback time, but it is welcome and contributes to that often marketed ‘video playback’ number.

The big upgrade in graphics for Carrizo is that the maximum number of compute units for a 15W mobile APU moves up from six (384 SPs) to eight (512 SPs), affording a 33% potential improvement. This means that the high end A10 Carrizo mobile APUs will align with the A10 Kaveri desktop APUs, although the desktop APUs will use 6x the power. Carrizo also moves to AMD’s third generation of Graphics Core Next, meaning GCN 1.2 and similar to Tonga based retail graphics cards (the R9 285).

This gives DirectX 12 support, but one of AMD’s aims with Carrizo is full HSA 1.0 support. Earlier this year when AMD first released proper Carrizo details, we were told that Carrizo will support the full HSA 1.0 draft as it currently stands as it has not been ratified, and they will not push back the launch of Carrizo until that happens. So there is a chance that Carrizo will not be certified has a fully HSA 1.0 compliant APU, but very few people are predicting major changes to the specification at this point before ratification that requires hardware adjustments.

The difference between Kaveri’s ‘HSA Ready’ and Carrizo’s ‘HSA Final’ nomenclature comes down to one main feature – context switching. Kaveri can do everything Carrizo can do, apart from this. Context switching allows the HSA device to switch between work asynchronously while it waits on the other part that needs to finish. I would imagine that if Kaveri came across work that required this, it would sit there idle waiting for work to finish before continuing, which means that Carrizo would be faster in this regard.

One of the key parts of HSA is pointer translation, allowing both the CPU and GPU to access the same memory despite their different interpretations of how the memory in the system is configured. One of the features on Carrizo will be the use of address translation caches inside the GPU, essentially keeping a record of which address points to which data and when an address is in a lower cache, that data can be accessed quicker. These ATC L1/L2 caches will be inside the compute units themselves as well as the GPU memory controller and an overriding ATC L2 beyond the regular L2 per compute unit.

Use of GCN 1.2 means that AMD can use their latest color compression algorithms with little effort – it takes a little more die area to implement (of which Excavator has more to play with than Kaveri), but affords performance improvements particularly in gaming. The texture data is stored losslessly to maintain visual fidelity, and move between graphics cores in this compressed state.

In yet more effort to suction power out of the system, the GPU will have its own dedicated voltage plane as part of the system, rather than a separate voltage island requiring its own power delivery mechanism as before. AMD’s latest numbers on the improvements here only date back to June 2013 via internal simulations, rather than an actual direct comparison.

All the performance metrics rolled in, and AMD is quoting a 65% performance improvement at 15W compared to Kaveri. The adjustment in design is allowing higher frequency for the same power, combined with the additional compute units and other enhancements for the overall score. At 35W the gain is less pronounced, but more akin to regular generational improvements anyway. What we see at 35W is what we would normally expect, and it pales in comparison to the 15W numbers.

One of the final pieces in the puzzle is AMD’s Secure Processor, which they seemed to have called the PSP. The concept of the security processor has evolved over time, but the premise of a locked down area to perform sensitive work that is both hidden and cryptographically sealed appeals to a particular element of the population, particularly when it comes to business.

AMD’s PSP is based around a single 32-bit ARM Cortex-A5, with its own isolated ROM and SRAM but has access to system memory and resources. It contains logic to deal with the x86 POST process but also features a cryptographic co-processor.

ARM has been promoting TrustZone for a couple of years now, and AMD has been tinkering with their Secure Processor proposition for almost as long although relatively few explanations from AMD outside ‘it is there’ have come forward.

Sometimes a name can inspire change. Carrizo isn’t one of those names, and when hearing the words ‘AMD’s notebook processor’, those words have not instilled much hope in the past, much to AMD’s chagrin no doubt. Despite this, we come away from Carrizo with a significantly positive impression because this feels more than just another Bulldozer-based update.

If you can say in a sentence ‘more performance, less power and less die area’, it almost sounds like a holy trifecta of goals a processor designer can only hope to accomplish. Normally a processor engineer is all about performance, so it takes an adjustment in thinking to focus more so on power, but AMD is promising this with Carrizo. Part of this will be down to the effectiveness of the high density libraries (which according to the slides should also mean less power or more performance for less die area) but also the implementation of the higher bandwidth encoder, new video playback pathway and optimization of power through the frequency planes. Doubling the L1 data cache for no loss in latency will have definite impacts to IPC, as well as the better prefetch and branch prediction.

Technically, on paper, all the blocks in play look exciting and every little margin can help AMD build a better APU. It merely requires validation of the results we have been presented along with a killer device to go along with it, something which AMD has lacked in the past and reviewers have had trouble getting their hands on. We are in discussions with AMD to get the sufficient tools to test independently a number of the claims, and to see if AMD’s Carrizo has potential.