Xda-developers.com

Geekbench CEO Fireside Chat pt.1: How 64-bit Shook Mobile, Throttling and Scores, How to Design a Benchmark and More!

2016-09-05

Geekbench 4 just released, and it has been making waves in the tech industry. It shows substantial improvements over Geekbench 3, and has seen a highly favourable reception.

The new version of Geekbench has resulted in a considerable shakeup of where devices rank in relation to each other. We’ve seen Qualcomm Snapdragon 820 chips fall, and ARM-based designs rise, leading to questions as to what changes were made to the benchmark to cause this shift.

We had an opportunity to sit down with the CEO of Primate Labs, John Poole, to interview him about the launch of Geekbench 4, and about the mobile ecosystem as a whole. In part 1 we talk about thermal constraints, how ARMv8 shocked the industry, how Primate Labs hopes to help people who are mildly tech savvy, the way the laptop industry is changing in response to the smartphone industry, and the difficulties in configuring and benchmarking multi-core systems.

Steven Zimmerman: We kind of already jumped into it a little bit. Geekbench 4 just came out which is very exciting. We’ve actually got a couple articles coming up on it. We’re trying to do some comparisons of how it’s changed since Geekbench 3, stuff like that.

John Poole: I know some of the folks at AnandTech, they had access to a beta release, because I know they use our stuff for some of their reviews, and I think it was Andrei was sort of, you know, “Oh finally, a benchmark that puts the chips in their proper place”.

Steven: Yeah, he specifically said that he was “very happy” about the changes, and about how it gets closer to what he feels is right for the relative placement of the different cores, which is always interesting to see, because it’s hard to judge how they actually perform.

John: Especially with mobile moving as quickly as it is. I kind of touched on this in the blog post that I wrote when we launched earlier this week, was that in August 2013, ARMv8 wasn’t even a thing. It was through the process of being standardized, but no one had it. I remember, we had a discussion internally about that, and we were saying “That’s a year off, easily”. And then I remember, I was actually at IDF in San Francisco, and Intel was up. They were doing their presentation for their “Here’s our upcoming 64 bit x86 core for mobile, isn’t this exciting”, and I remember seeing people getting up and leaving, and I thought like “What the heck is going on?” I open my laptop and Apple basically was like “Yep. 64 bit. It’s here. You can buy it next week.” I know a lot of the other ARM licensees have sort of spent time scrambling to catch up, and I think we’re now at the point where Qualcomm has their own custom core, Samsung now has the M1, there are a lot of exciting developments there. I think people are finally catching up. But when it came to us, to developing the benchmark, in August 2013 I think the iPhone 5 or the Samsung Galaxy S4 were the fastest phones available, and they’re a quarter of the performance of what you can get now. It’s insane how much stuff has changed in those three years. So we had a very specific set of design goals. We were targeting things were you could still run Geekbench on, not necessarily the original iPhone, but you could run it on the 3G, you could run it on a Samsung Galaxy S3 or something like that or an S2. Now, I mean, we kind of work on some of those older phones, but it’s going to take you 20 minutes to run the benchmark.

Steven: Not to mention the fact that very few people have them at this point, so it’s almost a wasted effort.

John: It’s interesting because you have a lot of folks such as yourselves who will always go after the new shiny, and it’s always sort of the latest and greatest. We have a lot of our users who will hold onto a phone for 3, 4, 5 years, and one of the big questions in their mind is “What’s my ROI? What’s my benefit to going out and buying the new Note 7 or the new HTC 10 or something like that?”.

Steven: Kind of like “At what point it becomes worth it.”

John: Exactly. And they still expect that, and we get a lot of our support requests over the first two weeks of a launch are sort of managing those expectations of “I want to be able to run this on my original iPhone” or “I want to be able to run this on my Galaxy Nexus or Nexus S”.

Steven: I had an S2 until last year actually. CyanogenMod and all that fun stuff.

John: Oh wow. Exactly, a lot of people still want to be able to run benchmarks on older hardware because, while there’s absolutely the people who use it to see who’s the fastest, a lot of other people (personal users and business users) use it to see “When we upgrade, what sort of bang for the buck are we getting?”, and they might be running on a 5 to 8 year old system, or a 3 or 4 or 5 year old phone.

Steven: Especially now with a lot of people feeling for computers that there aren’t quite the same jumps that we were seeing in previous years or with phones currently. So people are hanging on to Ivy Bridge and Sandy Bridge systems.

John: Oh, I’ve still got a Sandy Bridge system at home. That’s what I use for gaming. It’s an i7-2600k, and I know when Apple at their iPad Pro launch said that they “Feel bad for all those people running five year old computers”, and I sort of was like “It’s not so bad”. I mean, yeah, ok, I’ve upgraded the video card, but that’s about it.

Steven: Yeah, I’ve got a 3570k and a 7970. Same deal.

John: Exactly. It works. On the desktop side there’s been that plateau, and of course the interesting question is “Is that because Intel doesn’t really have any competition?” and they’ve focused all their energy onto mobile, or is it because we’re just starting to hit limits?

Steven: And that alone is going to be interesting to see in the upcoming little bit with AMD’s Zen coming up.

John: I’m optimistic about Zen. I know some people have been tempering my enthusiasm a bit. It’s like “It’s just one benchmark”, but I don’t even think AMD has to be as fast as Intel. I think they can just be competitive. Because the problem’s been that with Bulldozer and Steamroller and Piledriver, it was an interesting gamble at the time, and I remember in 2011 when those came out thinking “OK, this is a really interesting approach, let’s see if it pays off”, and it didn’t. So, the fact that they’re going to this new architecture; people are talking about how there weren’t as many cores and they’re down on the clock speed it’s like “Yeah, but they’re in spitting distance”, and that’s impressive in and of itself.

Steven: Because that can be enough to drive competition. Especially if they’re undercutting them. It can make a huge difference.

John: Exactly. The thing I’m really curious to see is if the desktop has just hit a natural plateau, or if the investment is in mobile because that’s the new hotness, that’s where all the growth is.

Steven: Also with how it was affecting the desktop and laptop stuff as well. I mean, you’re seeing fanless chips in laptops now even.

John: Yeah, the idea of buying a laptop without a fan in it seemed insane, but now it’s getting to the point where the M series chips are good enough to put in a laptop. The thing is, with Geekbench 3 we had the different workload sizes, so we’d be running the full suite on desktops regardless of whether they were fanless or not, and a smaller suite on phones, and I know there were some folks at Intel who were sort of like “Oh this isn’t fair, yada yada”, but on the vast majority of M3s it was fine. It was just on the odd ones where someone hadn’t spent the time on the chassis design to sort of dissipate that heat. That’s when they started hitting trouble. It’s sort of offloading things where before you could just stick a fan on it and you’d be fine, and now it’s sort of a “OK, now we have to look at the whole computer” kind of thing.

Steven: And that kind of leads into how you guys made some switches with how Geekbench operates a little bit. Specifically, I’m referring to how it used to be that you guys would just blast through and have it work for a certain amount of time, and I think now you guys are doing almost some time to completion style tests.

“I know all Qualcomm, hate to pick on them, but the 810 was something where there were a lot of thermal issues, real or perceived.”

John: Yeah, so we’ve changed up a few things in 4. Always in version 3 and version 4 we had a fixed amount of work. So we’d go through and we’d say “AES and encrypt 32 megabytes” and we time how long that takes. And we’d run that multiple times just we have a good sample set and then we would report a number out of that.

We’ve always done that. The difference would be that on 3 on a mobile phone there might be a smaller buffer rather larger, but that’s been the same thing. One of the things we’ve done in four is we’ve started pausing in between the workloads. Mostly because of thermally constrained devices like these fanless laptops, smartphones, and all that sort of thing. I know all Qualcomm, hate to pick on them, but the 810 was something where there were a lot of thermal issues, real or perceived.

Steven: Well, we were seeing devices where after the third run you would have half the score of the first-run.

John: And it would drop. And so the question is, and this is something we’ve wrestled with internally, was do we want a benchmark that reflects this thermal throttling issue or do we want a benchmark that sort of represents “This is what the processor can do.” A lot of workloads on phones are bursty. If you’re not playing a game, it’s going to be something like you open up facebook, you upload that picture, you scroll through, you see know like how many likes you have, and then you close your phone, and you’re contented with your life. it’s a lot of bursty stuff like that. Checking email or loading a web page. And you’re going to spend a lot of time where the CPU is just going to be idle. The other thing that we saw was that a lot of people when they’re using our data and reviews, AnandTech being a perfect example, is they will break us down workload by workload. They’re not going to look at an overall score. And so if let’s say we run all these benchmarks in order, if the like 20th workload is being penalized because it’s the 20th one as opposed to; this processor is just bad at floating-point vs. this processor just got hot because there’s no pause in there. That can distort a lot of things. I know AnandTech has gone to some crazy lengths to compensate for that. They run benchmarks and freezers and they’ll put things in ziplock bags and put them in water just to sort of mitigate these factors.

Steven: I remember for the Nexus 5 they went all out in that regard.

John: Yeah, it was crazy. I remember we actually found a thermal issue with the iPhone 5, and we were at the point where we were running it on ice packs. We just basically had a little bed set up and put it in there just to make sure that it wasn’t a thermal throttling issue. But anyway, we made the decision of “Let’s insert these gaps, let’s give the processor a little bit of time to cool down” because, not that we want to encourage people pushing those thermal envelopes to the point where they’re unrealistic, but at the same time the 20th workload should not be penalized, because again it’s the 20th workload.

““Let’s insert these gaps, let’s give the processor a little bit of time to cool down” … the 20th workload should not be penalized, because again it’s the 20th workload. The run order and the score should be independent of one another.”

The run order and the score should be independent of one another. This isn’t a perfect solution, I mean, if you’re in a 35 degree Celsius room, to pick on something in Toronto if you’re on Line 2 in the Subway, and you’re benchmarking your phone, ambient will still play an influence and all that, but we can’t fix that. This is at least a workaround to try and accommodate for some of these issues you might see in a phone that day-in day-out may not have a big impact on performance. We’d like to do something where we can measure thermal performance over time and we have a prototype version that we’ve kicked around of version 3 of this thermal test where it actually shows you where you start hitting problems if you’re running a continuous load.

Steven: I believe I remember seeing it on the Ars Technica article for the S810.

John: Yeah, Ars Technica has been working with us on that, just to sort of solidify that. Unfortunately, it’s still not at the point where we feel comfortable releasing it. Part of it is just it’s a complicated tool. One of the things we really want to do is make Geekbench as accessible as possible. Going off on yet another tangent, a lot of people in the past have complained that Geekbench “Is a toy, it’s this real small benchmark, everybody should be using SPEC”; well no one’s going to want to use SPEC. Outside of like, again, the AnandTechs of the world, the analyst in the world who have the time and energy to spend three days trying to get SPEC working on their phones.

Steven: It’s a little crazy.

John: Yeah. We want to be able to be a tool that 1. we want to provide good data first of all. We don’t want to put in, I know there are other benchmarks out there and I won’t name names because that’s not the game I want to get into, but there are benchmarks out there that had very straightforward and simplistic workloads that, you know, how representative they are of anything, it’s hard to say. But then the problem becomes, and we wrestled this with versions 3; okay, well how do we fit big meaningful workloads onto the phone. Well it’s challenging because the phones themselves are small. And another issue we run into is we don’t want a benchmark that was going to take 20-30 minutes to run. That’s the sort of things we play with, because at the end of the day I want to create a tool. It’s great to have websites use us and it’s great to have not just the PCWorlds of the world use us, but also the AnandTechs of the world where they’re looking to really do these technical deep dives, and hopefully Geekbench can give them good information. At the end of the day I’d like to be able to create a tool that someone who’s moderately technical like someone who does IT in their spare time or something like that, or where a reasonably savvy consumer who is asking “Which phone should I buy?” or “Is my current phone working?” even, we can give them a tool, and in two or three minutes they can get an answer.

Steven: That’s also a fairly important angle. Seeing if there’s any issues with the current processor that you’re using; seeing if it performs up to how it should be performing.

John: I remember when the first iPhone came out, I thought “Oh, we should benchmark it for curiosity. Once we provide a score, what’s the point? Why are we going to keep doing this?” but after we launched Geekbench for mobile what we found were a lot of people coming back to us saying “Oh, this is great I just got my phone I wanted to make sure it was working properly.” Or other people who come to us and say “My phone wasn’t working properly and you showed me, and I was able to go into like the store and say I should be getting 10,000 points, I’m getting 1000 points, what’s going on?” We’ve talked to people where they use Geekbench scores to show people “There’s something wrong here.” It’s not just “What’s the fastest phone?”, but it’s also “Is my phone fast?”, “Is my phone performing properly?” Which I think is really important. I think it’s something that when people talk about these big complicated benchmark suites like SPEC; it’s great for doing deep dives into architectures, it’s not so great helping regular people solve the sort of problems that they’ll have with their phone.

Steven: You’ve seen CPUs evolve in mobile for many years now, some companies like Qualcomm have grown, some like Texas Instruments have fallen by the wayside, and some new players have made some pretty big strides. What are the biggest surprises you’ve seen over the last little bit?

John: One of these we already touched on. 64-bit was a huge surprise. But in terms of the general market in and of itself, big.LITTLE has been a weird thing for me because I never fully under… and I mean we’ve talked to people and one of the things we keep saying is “What’s the point of big.LITTLE?” Again, to point out Intel, because they’ve complained about this quite loudly, is that “Is big.LITTLE just a marketing thing?” Are people putting more cores on a phone just so they can say “Hey it’s an eight core phone; it’s a ten core phone”?

Steven: Well, we’re seeing that with the Mediatek X25 where they put A72 cores on, and they almost don’t use them.

John: We’ve talked to people who maybe are not going for quite that radical of an approach, one of the big things that keeps coming up is that putting the chips on there is easy. Getting the operating system, the scheduler in the operating system to use these chips effectively, that’s the hard part. And I think we’re finally starting to see this. If you look at how Android’s been progressing, and Google themselves have done some great work, the nexus 5x and 6p they’re both big.LITTLE designs and they’re really some of the nicest big.LITTLE phones out there, because they’ve done a lot of work to make sure the scheduler does exactly the right thing on these chips. And we’re seeing now where you’ve got two design camps.

Getting the operating system, the scheduler in the operating system to use these chips effectively, that’s the hard part

Apple’s gone with these big complicated cores with low clock speeds, but they only put two them on one chip. And basically they’ve made it very clear “We think single-core performance is the most important thing out there, screw everything else.” They’ve put in an extra core for some concurrency stuff, because there are times the second core is super helpful in their phone, but really “Single core, that’s where it’s at, we’re going to push that number up as high as we can.” You see other companies like Samsung and Qualcomm say “Well maybe more cores are useful, we’ll do smaller lower power cores”, and especially with the big.LITTLE designs when you can actually use those little cores as a “the phone’s not really doing anything that important, so you can just keep it on one of these cores.” And Nvidia did these low-power process nodes when they did their 4.1 in the Tegra 3.

Steven: They did some interesting things, like they had A15 cores as their little cores at one point.

John: Yeah, but they also did it on a completely different manufacturing process. My understanding was they did it in a completely different way so that it would sip as little power as possible. Now that we’re seeing that and we’re seeing Android take advantage of these eight core designs; there are certain tasks you can do on your phone where if you hook it up to instrumentation, you can see all 8 cores light up, and then they all turn right back off. It’s different design approaches. I know seeing 10 cores seems a little crazy, but we’ve seen things where we’ve taken a six core phone and we start turning off cores, and even if you turn off two of the little cores, the phone starts to kind of get a little janky. As soon as you turn off all the little cores, it just sloooows right down. They’re not that powerful, but they’re able to keep stuff off the big cores.

Steven: Little background tasks.

John: Exactly. Exactly. So, other surprising things… I’m surprised Nvidia isn’t a bigger player in mobile.

Steven: Well, a big part that was they were having some issues with the cellular radios, and the same with Intel and even Samsung, while they are working on fixing it. Not having the right LTE radios integrated makes a huge difference for whether other companies will actually buy your chip.

John: Right. I know for us radios are like “Pfft, radio shmedio”, but even some of the stuff that they did in Denver was really interesting. I mean, look at the Nexus 9. It’s a tablet that it’s disappointing that it didn’t see more traction. Part of that was build quality, but it’s a really neat chip. It’s a chip that whenever we kind of play around with it here (we’ve got a couple in the lab), I’m impressed with how fast it is. And again, they went like the Apple philosophy of “Two big cores, isn’t this exciting”.

Steven: And the Shield tablets are great dev devices. They’re the first tablets to support Vulkan, and the first TV box to support Vulkan, and a whole bunch of stuff.

John: And the TV box is great because that’s an actively cooled device. So, for us when we’re really trying to nail down “Is this a small one or two or three percentage improvement that we made to our algorithm”, having something that’s actively cooled really cuts down the sort of guesswork there.

Steven: Right, and they’re also targeting cars and stuff like that, which doesn’t help their TDP all that much.

John: Yeah, exactly. I mean I’m not surprised TI is out. I’m a little surprised that Intel seems to kind of throw in the towel.

Steven: Well that’s the weird thing. They’ve almost tried to enter the ARM game now. They announced that they’re going to be fabing for third-party ARM chips. Potentially LG I believe.

John: Yeah, I think LG was one of the first announcements. They’ve talked about opening up that fab, I’ve heard rumors and such for ages and thought it’ll be the point where they’re like “We’ve got more capacity than we need.”

Steven: Well, they’ve always had more capacity, I just never thought they’d actually do it.

John: Yeah, exactly. I guess someone out there just figured “Well, maybe we should start using this.” And again, they’re dealing with the issues of doing their process shrinks, where they’ve gone from Tick-Tock to I can’t remember what the actual…

Steven: I think it’s architecture, process, optimization, although not in that order.

John: Yeah, something like that. I call it Tick-Tock-Tweak. But we’re seeing that with Kaby Lake. They just came out with Kaby Lake earlier this week, but it’s six SKUs and that’s it.

Steven: That’s just how it’s been for the past couple years. It takes a little bit for everything to come out.

John: Exactly, but I’m used to them coming out with a big expensive desktop chip, and them saying “Here’s your i7 that’s unlocked. The 7700k.”

Steven: Well, we’re seeing what looks like it may be a 7700k in a laptop. I think it was Acer. They announced one laptop that’s tiny tiny tiny, and then they also announced “The biggest laptop on the market.” It’s a full desktop with a screen. $5000. It has two 1080s.

John: That’s insane. We’ve got some Alienwares, and that’s about as thick as I’d want for a laptop. I think for me the big surprise was Apple coming out when they did that 64-bit launch. Watching how people have caught up, I’m really excited to see Samsung designing their own cores, because I think the more diversity in the ecosystem the better, because if everybody was just using sort of “Oh, we’ve got an A57, we’ve got an A72”, where’s your value added.

To see Qualcomm and Samsung still trying to get these really high power high performance chips is encouraging

And I understand for companies like MediaTek that makes sense because they’re sort of “Let’s make these chips as cheaply as possible and just pump them out there”, but to see Qualcomm and Samsung still trying to get these really high power high performance chips is encouraging.

That’s it for part one. Stay tuned as there is a lot of in-depth information and entertaining tidbits coming in part two of our fireside chat. We hope you enjoyed the chat and learned a thing or two!