Apple, ARM, and what it means.

Saturday, August 1, 2020 1:49 PM

For the past 15 years, Apple’s Macs have run on Intel processors, using the same x86 architecture as competing PCs. Meanwhile, Apple has used ARM processors for its mobile devices ever since the first iPhone. Apple has often seemed stymied by Intel’s slower pace of innovation, compared to their large strides in mobile power. That’s why, for years now, people have been discussing the possibility of Apple moving to ARM processors for their desktops and laptops.

This reinvention has read like something between an open question and a sure thing, ever since Apple’s modern phone processors started giving Intel’s x86 offerings a solid competitor. And now it has really happened! Apple will use custom-built ARM processors to power their MacBooks,, and, at some point, probably their iMacs and Mac Pro. But a lot of people seem confused as to what this really means.

I know my way around some deep processor weeds, so this has been interesting to me. There’s a ton of uninformed garbage on the internet, as usual, so I’ll offer my attempt to clarify things!

x86? ARM? Huh?

This will be Apple’s third major Instruction Set Architecture (or ISA) transition in their history. They moved from 68k to PowerPC in the mid-1990s, then from PowerPC to Intel in 2006—and now to ARM. This makes them pretty much the only consumer hardware company to regularly change their processor architecture, and certainly the largest to pull it off consistently.

The instruction set is, at nearly the lowest level, the “language” that your processor speaks. It defines the opcodesthat execute on the very guts of the processor, and there have been quite a few of them throughout the history of computing. Adding two numbers, loading a value from memory, deciding which instruction to execute next, talking to external hardware devices—all of this is defined by the ISA.

In the modern world of consumer computing, there are only two that matter, though: ARM and x86 (sorry, RISC-V, you’re just not there yet). Up until recently, the division was simple. x86 processors (Intel and AMD) ran desktops, laptops, and servers. ARM processors ran phones, tablets, printers, and just about everything else.

Except for a few loons (like me) who insisted against all reason on making little ARM computers function as desktops, this was how things worked. But not anymore!

Intel and AMD: The power of x86

For decades now, the king of the performance hill has been the x86 ISA—primarily on chips from Intel and AMD (within a rounding error—there have been other implementations). Pentium. Core. Athlon. Xeon. These are the powerhouses. They run office applications. They run games. They run the giant datacenters that run the internet. If it’s fast and powerful, it’s x86. Intel is the king, though AMD has made a proper nuisance of themselves on a regular basis (and, in fact, the 64-bit extensions to x86 you’re probably running were developed by AMD and licensed by Intel).

From power-sipping Atom processors in netbooks to massive Xeons and Opterons in data centers, x86 had you covered. It’s been the default for so long in desktops and servers that most people don’t even think about it.

But the x86 ISA is over 40 years old (introduced in 1978), and it’s accumulated a lot of cruft and baggage over the years. x86 instructions are maddeningly complex, with all sorts of weird corners and nasty gotchas, and they’ve only gotten worse with time. MMX. SSE. VMX. TSX. SGX. TXT. SMM. STM. You either have no idea what those are, or you’re shuddering a bit internally.

All of it adds up. Go read over the Intel SDM for fun, if you’re bored. It’s enjoyable, I promise—but it’s also pretty darn complicated.

ARM: Small, power-Efficient, and cheap

Historically, the ARM architecture has been the lower-power, far cheaper, power-efficient alternative to x86. They’re a set of compromises—they’re fairly small, cheap to make, and power-efficient. They’re not focused on blazing-fast performance, but they’re usually good enough, and as a result they’ve ended up in just about everything that’s not a desktop or laptop.

In the last year or two, ARM chips started firing solid cannon blasts across the bow of the Intel Xeon chips in the datacenter. If you don’t care about raw single-threaded performance, but care an awful lot about total throughput and power efficiency, ARM server chips are properly impressive. For less money than a Xeon, you can have more computing power on fewer watts. Not bad!

ARM chips have made their way into a few laptops and desktops, but they’ve not been a threat to the flagship performance of Intel.

Except… there’s Apple. Who, on occasion, is totally insane in interesting ways.

Apple’s ARM: Big, power-efficient, and fast

Most companies just license ARM’s reference cores (A72, A53, etc) and attach their desired peripherals to them. The resulting ARM chips run just about as fast as any other company’s ARM chips with the same cores and clock rate.

But a few companies have the proper licenses and engineering staff to build totally custom ARM cores. Apple is one of these companies. Starting with the A6 processor in the iPhone 5, they used custom processor implementations. These were ARM processors, sure, but they weren’t the reference ARM processor implementations. The first few of these were fairly impressive at the time, but were clearly phone chips, meant to save power and run cool. And then Apple got crazy.

If you own an iPhone 11, it’s (probably) faster than anything else in your house, in terms of single-threaded tasks—on a mere few watts! Apple has quietly been iterating on their custom ARM cores, and has created something properly impressive—with barely anyone noticing in the meantime. Yeah, the review scores got more impressive each year. Yeah, modern Android devices were stomped by the iPhone 8 on anything that’s not massively parallel—but they’re phones. They can’t threaten desktops and laptops, can they?

They can—and they are.

A laptop has far more thermal capacity and power to offer one of Apple’s chips than a phone. The iPhone 11’s battery is 3.1Ah @ 3.7V, working out to 11.47Wh. The new 16″ MacBook Pro has a 100Wh battery, making it 8.6 times larger. Some of that goes to the screen, but there’s a good bit of power to throw at computing. And if you plug a computer in, well, power isn’t a problem!

This is interesting, of course—but it’s not all. There’s also how Intel has found themselves stuck.

Intel: Stuck on 14nm with a Broken Microarchitecture

If you’ve kept up with Intel over the last few years, none of this will be a surprise. If you haven’t, you might only notice that they seem to produce strikingly similar processors every few years, or sometimes months. Here’s why.

Intel has been stuck on their 14nm process for about 4 years (and their 7nm plans have slipped yet again). Their microarchitecture is fundamentally broken in terms of security. And worse, they can’t fully understand their own chips anymore.

Yes, Intel has been iterating on 14nm, and releasing 14nm+, 14nm++, 14nmXP, 14nmBBQ, and other improvements (okay, I made those last two up)—but everyone else has leapfrogged them. Intel used to have an unquestioned lead over the rest of the tech industry in building their chips, and it showed. Intel chips (post Netburst) used less power and gave more performance than anything else out there. They were solid chips, they performed well, and they were worth the money in most cases. But they’ve lost their lead.

Not only have they lost their lead, their previous chips have been getting slower with age. You might have heard of Meltdown and Spectre, but those are only the camel’s nose. There has been a laundry list of other issues. Most of these recently discovered flaws have fixes—some microcode updates, some software recompilation, some kernel workarounds. But these workarounds slow the processor down. Your Haswell processor, today, runs code more slowly than it did when it came out. Whoops.

Finally, Intel can’t reason about their chips anymore. They seem unable to fully understand the chips, and I don’t know why. I do know that quite regularly, some microarchitectural vulnerability comes out that rips open their “secure” SGX enclaves—again. In the worst of them, L1 Terminal Fault/Foreshadow, the fix is simple—flush the L1 cache on enclave transition. The hit on processing speed isn’t a dealbreaker, but the point is that Intel didn’t know it was a problem until they were told.

So, why would you want to be chained to Intel anymore? There just aren’t any good reasons. AMD has caught up, but Apple wouldn’t move from one third-party vendor of x86 to another. They’re pulling it all in-house—which should be very, very interesting.

Apple’s ARM advantages

Custom-built ARM chips give Apple better control over the stack, which should translate, rather nicely, to a better user experience (at least for most users).

Apple will be able to deliver the same performance on less power—and, in the bargain, be able to integrate more of their “stuff” on the same chip (for even more power savings). There won’t be a need for a separate T2 security processor when they can just build it into their main processor die. They won’t be stuck dealing with “Whatever integrated GPU abomination Intel has dumped in this generation”—they can design the chip for their needs, with their hardware acceleration, for the stuff they care about. (Yes, I know Intel isn’t as bad as they used to be, but I sincerely doubt Apple is happy with their integrated GPUs).

Fewer chips also means a smaller board—which generally means less power consumption or better performance. That’s more important than ever, because Apple is literally sitting on the limit of how much battery they can have in a laptop. They claim their laptops have a 100Wh battery—with a disclaimer.

What, you might ask, is wrong with a 100Wh battery, such as warrants a disclaimer saying that it’s actually 0.2Wh less?

There are a lot of shipping (and carry-on luggage) regulations that use 100Wh as a limit—and they’re usually phrased, “less than 100Wh.” Apple is being explicitly clear that their laptops are less than 100Wh—even if they advertise them as being equal to 100Wh. USPS shipping, UPS, TSA—all of these have something to say about 100Wh batteries. You can ship batteries over 100Wh, but you have to essentially declare them as hazardous materials; it’s not a very customer-friendly scene.

So Apple literally can’t put any more battery in their laptops if they expect to ship them inexpensively or let people travel with them easily. The only way to increase runtime is to reduce the power they use. Apple’s ARM chips will almost certainly use less power than Intel chips for the same work. Or Apple might, in keeping with tradition, use their custom chips to offer a smaller, lighter laptop with the same battery life—something nobody else can touch.

Either way, this bold move sets them apart from the rest of the industry, who will be stuck on Intel for a while longer. Microsoft’s ARM ambitions have gone roughly nowhere, and Linux-only hardware isn’t a big seller outside Chromebooks.

Apple also gains a lot from being able to put the same “stuff” in their iOS devices as in their laptops. Think machine learning accelerators, neural network engines, speech recognition hardware—the gap between what an iPhone can do and what a MacBook Pro can do ought to narrow significantly. We might even see a return of laptops with built-in cellular modems!

x86_64 on ARM emulation

There are a few sticking points about Apple switching to ARM. First and foremost: current Macs are based on x86—so, what happens to your existing software? Switching to ARM means x86 software won’t work without some translation. It turns out, Apple has a massive amount of experience doing exactly this sort of thing. They made 68k binaries run fine on PowerPC, and they made PowerPC binaries run acceptably on x86. And now they have Rosetta 2 to provide integrated emulation.

Emulating is hard, especially if you want to do it fast. There’s a lot of work in the Just In Time (JIT) space for things like JavaScript, but with Apple’s emulation, I’d expect a JIT engine to be used infrequently. The real performance will come from binary translation—treating the x86 binary as source code that gets turned into an ARM binary. This lets you figure out when some of x86’s weirder features (like the flags register being updated on every instruction) are needed, and when they can be ignored. One could also, with a bit of work, figure out when ARM’s rather relaxed memory model, compared to x86, will cause problems, and build in barriers. It’s not easy, but it’s certainly the type of thing Apple has done before, and it sounds like Rosetta 2 will continue this trend.

What would be very interesting, though (and I think Apple might pull this off) would be “going native” at the library calls. If you’re recompiling a program, you know when the code is going to call into an Apple library function. If they got a bit creative, emulated software could jump from the emulated application code to native library code. This would gain a lot of performance back, because all the things Apple’s libraries do (which is a lot) would be at native performance!

Virtualization: Goodbye native Windows performance

The next question, for at least a subset of users, is: “But what about my virtual machines?”

Well, what about them? ARM supports virtualization, and if all you want to do is run a Linux VM, just run an ARM Linux VM!

But if you need x86 applications (say, Windows 10)? This is where it will be interesting to watch. I fully expect solutions out there—VirtualPC, years ago, ran x86 Windows on PowerPC hardware. I just don’t know what sort of performance you can get out of full system emulation on ARM. Normally, things have gone the other way: emulate a slow ARM system on a fast x86 machine. The performance isn’t great, but it’s good enough. Going the other way, though? Nobody has really tried, because we’ve never had really fast ARM chips to mess with. Of course, if Apple’s chips are 20-30% faster than the comparable Intel chips, you can spend some time on emulation and maybe come out even or ahead.

If you’re hoping to run some weird x86 utility (Mikrotik Winbox comes to mind as the sort of thing I’d run), I’m sure there will be good enough solutions. Maybe this will light a fire under Microsoft to fix their ARM version of Windows and emulator.

But for x86 games? Probably not. Sorry.

The ARM64 software ecosystem: Yay!

Me, though? I’m really excited about something that almost nobody else cares about.

The aarch64 software ecosystem is going to get fixed, and fast!

I’ve been playing with ARM desktops for a while now—Raspberry Pi 3, Raspberry Pi 3B+, Raspberry Pi 4, and the Jetson Nano. The swell of interest and use of Raspberry Pis, ARM-based Chromebooks, and similar devices have made living with 32-bit ARM not just tolerable, but day-to-day livable.

The 64-bit stuff, though? Software builds just randomly fail. Clearly, there is very little development and polishing on it. You end up with things like Signal not building on 64-bit ARM, because some ancient dependency nested inside in the Node dependencies doesn’t know what aarch64 means.

With Apple moving into the high-end 64-bit ARM space, I expect all of this to be fixed—and fast! Which is great news for light ARM systems, even if you don’t personally own a MacBook.

Intel’s future

Where does this leave Intel?

It depends a bit on just how good Apple is, and how soon Intel can get back on the tracks. If Intel iterates quickly, gets their sub-10nm processes working, and fixes their underlying microarchitecture properly (instead of using universities as their R&D/QA labs), we should be in for a good decade of back and forth. Apple’s team is excellent, Intel’s team… well, used to be excellent, at least, and they’ve certainly both got resources to throw at it. As we crash headlong into the brick wall at the end of Moore’s Law, getting creative (ideally in ways that don’t leak secrets) is going to be important.

Or, we could end up with something like the state of phones. Anandtech’s iPhone 11 review benchmarks the latest flagships in web performance. From the top, it goes Apple, Apple’s previous generation, Apple’s generation before that, a big gap, and then the rest of the pack. And, yes, they do mention that the iPhone 11 is a desktop grade chip in terms of SPEC benchmark performance.

Could the laptop/desktop market look this bad? Probably not. But it’s a possibility. It really depends on what Apple’s chip designers had waiting in the wings for when someone finally said to them, “Here are 35 watts to play with.”

But between Apple moving away from Intel, and data centers working out that ARM options are cheaper to buy, cheaper to run, and faster in the bargain? Intel might have a very hard hill to climb—and they may not make it back up.