2015-06-28





Sophie Wilson was made a Fellow of the Royal Society in 2013, for having made “a substantial contribution to the improvement of natural knowledge”.

By 2014, over 50 billion ARM processor cores had been shipped since the first ARM chip was created by Sophie Wilson in the mid-1980s. Ten billion of those were produced in 2013, so by the time you read this, the figure is probably coming up on 60 billion. This meteoric rise from a mere 10 billion ever shipped in 2008 mirrors the rise of mobile computing. Nearly 60% of mobile devices, and 95% of smartphones, contain an ARM-based chip. You’ve probably got one in your pocket right now. I certainly have. So where did they start out?

Sophie Wilson was born in Leeds in 1957, and studied maths at Cambridge. In 1978, during the big microprocessor boom (see the BASIC article in LV005), she was working with Hermann Hauser to solve a problem for a fruit machine manufacturer. Someone had developed a hack which used a cigarette lighter to shock (literally!) the new electronic machines into disgorging cash. Wilson created a radio receiver to detect the cigarette lighter spark, solving that problem; whereupon Hauser challenged her to create a working PC by the end of the summer. Wilson succeeded, and six months later, Hauser’s company, now relaunched as Acorn Computers, started offering the Acorn System One, with a princely 512B of RAM, for £70. Everything was built in-house: logic circuits, assemblers, BASIC interpreters – the lot. By mid-1981, the UK PC market was dominated by the ZX81 (by Clive Sinclair, and available in WHSmith shops) and the Acorn Atom (more expensive, and only available as a kit from Acorn).

In 1981, Wilson improved and extended the Acorn’s version of BASIC into the Acorn Proton, which then became the BBC Micro and had its BASIC developed into BBC BASIC. The Proton was built in a week after Chris Curry, co-founder of Acorn, promised the BBC that they would have a machine to demonstrate within the week. They made it – just.

Wilson ported the OS across to the Proton’s raw hardware, and installed BASIC, in the two hours between the hardware working and the BBC arriving for the demo.

However, what we’re looking at in this article is ARM, the Acorn RISC Machine, one of the first RISC processors, which later became one of the most successful IP cores of the 1990s and 2000s, in particular for use in mobile devices.



The first ARMv1 in an evaluation system.

Creating ARM

The ARM chip was a specific instance of a RISC processor. Reduced Instruction Set Computing (RISC) originated at IBM. It meant that instead of the increasingly complex instructions that processors were using in the early 1980s, a RISC processor would use a limited set of simple instructions. However, IBM hadn’t really got anywhere with the idea – they’d created a RISC processor after months of work simulating instructions on a mainframe, but it was a commercial flop. Meanwhile, working on the BBC machines, Acorn were becoming frustrated by the limitations of the BBC’s microprocessor. The main problem was the memory interface: how fast a chip could fetch, and thus execute, transactions. Wilson found it frustratingly slow, and it was restricting what they could do with their secondary processors.

After reading one of the first papers about RISC, Wilson and Acorn started investigating their options. A visit to the huge facilities at National Semiconductors in Israel was depressing; Acorn couldn’t afford anything like that. Then they visited the much smaller but very successful Western Design Centre in Arizona, which consisted of only a couple of bungalows and a small team of engineers and students. Reassured that you didn’t need a huge operation to design processors, Wilson got stuck into designing the ARM instruction set back at her desk at Acorn (and in the local pub over lunches with colleagues!). Steve Furber was then responsible for turning Wilson’s instruction set into something that could be produced at a factory. Eighteen months later, they had the first working ARM.

It’s odd that what is now the major selling point of ARM processors, their low power consumption, was only a side effect. What Acorn were interested in was low cost, and low cost meant plastic. Plastic is a good insulator, which is bad news on a high-power chip as the heat takes longer to dissipate and your chances of frying the chip increase. So that in turn meant keeping the ARM power consumption under 1W.

However, when they got the first test chips back and plugged them into a development board, the chip worked – but seemed to be consuming no power at all. It turned out that there was a fault in the board, and the power supply line wasn’t working. The chip was, as Wilson explains, “running on leakage from the logic circuits”. The chip consumed an incredibly low 0.1 watts. Wilson’s ARM, it turned out, was a particularly efficient version of RISC.

Wilson rewrote BBC BASIC in ARM assembler very efficiently, but the first complete ARM computer was the Acorn Archimedes in 1987. It and its successors were among the most powerful home computers at the time. Of more long-term importance, Apple had realised that the ARM processor needed only a small amount of chip real estate – making it possible to squeeze further processing power onto the same chip. Apple invested heavily in ARM for the Newton (the first ever tablet, which flopped); but the investment paid off later in the iPhone, iPod, and iPad.

Acorn Archimedes setup in 1987

RISC

The basic idea behind Reduced Instruction Set Computing (RISC) is that you can get better performance (compared to a complex, specialised instruction set) out of a simplified instruction set running on a microprocessor which needs as few as possible cycles per instruction. The ‘reduced’ refers not necessarily to the number of instructions, but to the amount of work that an instruction does – each instruction should use a single clock cycle (often achieved by using a technique called pipelining). A precise definition is hard to pin down, but two common RISC traits are a small, highly optimised set of instructions; and load/store architecture, where memory must be accessed through specific instructions, rather than as part of other instructions.

RISC is inherently more power-efficient than, say, x86, because a RISC instruction is always four bytes long. That means that the chip doesn’t have to expend any processor power in parsing the length of the instruction and separating instructions. So (put very simply) a RISC instruction takes less energy to handle, and can be understood by a smaller chip.

The two projects most associated with RISC are Stanford’s, which emerged into the commercial world as the MIPS architecture, and Berkeley’s RISC, which eventually became SPARC. IBM’s efforts (after their initial commercial flop) eventually led to the Power Architecture. And of course ARM has been incredibly successful, as have other RISC architectures.

ARM architecture and instruction set

When Wilson and the other Acorn folk were designing ARM, they weren’t dedicated to sticking exactly to the model set by Berkeley RISC. They kept the load/store architecture, the fixed length instructions, and the three-address instruction format (destination, operator 1, operator 2). They rejected register windows, branch delay slots, and universal single-cycle instructions (most ARM instructions are single-cycle, but not all of them). ARM also initially lacked multiply and co-processor support. It had a 32-bit data bus, 26-bit (later 32 bit) address space, and 27 32-bit registers.

Since ARMv4T, ARMs have a second instruction set: the 16-bit Thumb set. This increases compiled code density by reducing the available functionality. The shorter opcodes also improve performance, especially on embedded hardware with limited memory bandwidth. If you’re interested in the details of the registers (37 of them), processor modes, exception handling, and so on of current ARM chips, there’s a great lecture online at http://www.ee.ncu.edu.tw/~jfli/soc/lecture/ARM_Instr_Set.pdf from Jin-Fu Li, National Central University, Taiwan. You can also get extensive documentation for various chips from the ARM website.

I wasn’t able to find an instruction set for ARM v1, but 1987 documentation for ARM v3 should have largely the same instructions (with a larger address space). They divide into five basic groups:

Data manipulation (ADD, AND, MOV, SUB, CMP etc).

Load and store (LDR to load a register and STR to save one).

Multiple load and store (LDM, STM).

Branch – conveniently jump between instructions.

Software interrupt (SWI, but there are many different expressions that can be passed to it to determine what it does, including keyboard output and input).

Let’s take a look at some ARM assembler code. This example from an ARM handbook multiplies a value by 6:

ADD Ra,Ra,Ra,LSL #1 ; multiply by 3
MOV Ra,Ra,LSL #1 ; and then by 2.
ADD

takes three arguments: one destination and two operands. So

ADD Ra,Rb,Rc

means

Ra := Rb + Rc

(where Rn is register n). However, the line here seems to have a third operand, LS#1.In fact, the second operand isn’t Ra, but

Ra,LSL #1

LSL #n means Logical Shift Left n places, which effectively multiplies the number stored in Ra by 2n. (Similarly, if using logical shift right (LSR),

Ra,LSR#n

divides Ra by 2n.) So here, Ra,LSR#1 multiplies Ra by 21 = 2. Thus,

ADD Ra,Ra,Ra,LSL #1

means

Ra := Ra + (Ra * 2)

ie

Ra := Ra * 3

To add an absolute value, you could write it like this:

ADD Ra, Ra, #1

This would add 1 (the absolute value 1) to Ra, and store the result back into Ra – acting as an increment line. MOV transfers its operand to the destination register:

MOV destination, operand

So here:

MOV Ra,Ra,LSL #1

means that Ra,LSL#1, that is, Ra 2^1 = Ra is transferred into the Ra register. So this line just multiplies Ra by 2. Since the previous line multiplied Ra by 3, the total effect is to multiply the contents of Ra by 6 and store the result back in the Ra register.

You may have noticed that multiplying by 8 would have been rather easier:

MOV Ra,Ra,LSL #3

And, of course, there are many ways to achieve the same result. The left-hand operand must always be a single register, but the right-hand operand can, as here, contain other operations. This versatility is helpful when maximising code efficiency.

Here’s a slightly more complicated example. I’ll use the code from the Grace Hopper article from LV002, which instructed UNIVAC to add a series of numbers stored in memory addresses 100–999. Memory in UNIVAC was a series of registers from 0-999, whereas memory in ARMv1 used a 26-bit address value, with a 4 byte (32 bit) word length. This means that ARM word addresses start at 0 and go up in 4s: 0, 4, 8, … 64M. I’ve translated UNIVAC addresses 100-999 as ARM memory addresses &1000-&1E0C (in hexadecimal). A semi colon denotes that the rest of the line is a comment. This is theoretical code, not tested, but should give you an idea of how ARM assembler works.

MOV R0,#0 ; Zero the running total
MOV R1,#0 ; Zero the number that holds the next value
MOV R2,#1000 ; Store memory address 1000 into R2
.LOOP ADD R0,R0,R1 ; Label loop, and R0 := R0 + R1
LDR R1,[R2],#4 ; Load the contents of R2 address and increment it
TEQ R2,#1E10 ; test which address we’re at
BNE LOOP ; carry on unless we’re done
SWI WriteI+R0 ; output the running total with SWI (pseudo-code)

Let’s look at that in more detail:

MOV R0,#0 this loads the literal value 0 into R0. The next two lines work similarly, initialising R1 and R2.

LOOP this is a label for the first line of the loop.

ADD R0,R0,R1 as above, R0 := R0 + R1. Note that the first time around, this translates as R0 := 0 + 0, ie nothing happens.

LDR R1,[R2], #4 Load contents of address held in R2 into R1, then increment R2 by 1 word. Note that this requires the numbers you’re adding to be single-word length. The first time through the loop, this will load the contents of memory address 1000 into R1 (so the next time through the loop, the ADD line will add it to R0), and increment the memory address stored in R2 ready for the next time through the loop.

TEQ R2,#1E10 – TEQ compares its two operands, here the value of R1, and the address 1E0C (the address after the final memory address we want. The Z result flag is set to 1 if they are equal, 0 if not.

BNE LOOP – B is the simple branch instruction, and send us back to the LOOP label. The conditional suffix NE stands for Not Equal. If Z is not set, then a BNE instruction will run. If it is set, then BNE is not true, and will not run. The opposite of this is EQ. BEQ would run if Z is set, and not if not. This instruction stops the loop if we’ve passed the final memory address, ie we have run out of numbers to add.

SWI WriteI+R0 – SWI offers a call-out to other instructions, and the instructions available will depend on the details of the architecture. Input/output are usually available, and this pseudocode outputs R0.

If you want to delve further into ARM Assembly language programming, I strongly recommend the web-based version of Pete Cockerell’s 1987 book, ARM Assembly Language Programming, at

www.peter-cockerell.net/aalp/html/frames.html. This covers specifically ARMv3, but I found it to be a useful reference for the basics of ARM programming (and an interesting document!). An ARM quick reference card is available from ARM at http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf.

RISC OS

RISC OS 3 – an OS that lives on in a version for the Raspberry Pi.

Acorn’s other big achievement was RISC OS. After some financial problems, in 1985 Olivetti took a controlling stake in Acorn, but the company continued to operate independently. During this time, Acorn was developing RISC OS for the Archimedes, and released it in 1987 as Arthur 1.20. The original aim was to develop something similar to the functionality of the BBC Micro/Master OS, while waiting for the more complicated ARX system to be ready for release. However, Arthur’s small size, constant delays of the ARX project, and the realisation that Arthur could be extended to provide a window manager and desktop environment, meant that ARX was eventually dropped and Arthur/RISC became Acorn’s main OS. It had a primitive GUI, but could only run one application at a time, and most work was done via the command line.

Arthur 2 became RISC OS 2 and was released in 1989. The GUI was now the main way of interacting with the OS, and it had added some co-operative multitasking. Graphics and sound were also a big improvement. (For comparison, Apple’s colour UI OS, System 7, was released in 1991.) Further developments were made in RISC 3.x versions, including a bunch of useful built-in applications and improved font support.

Acorn released the new RiscPC in 1994, with 16 million colour display and the ability to handle up to 256MB of memory (rather than the 16MB of previous machines). RISC OS 3.5 was released to handle these improvements but otherwise was pretty similar to previous releases. Further updates were similarly hardware driven.

In 1999, following further financial problems, Acorn was renamed as Element 14 Ltd, after which it was bought out. ARM Ltd had been spun off in 1990, and was doing very well, so this move allowed Acorn shareholders to cash out their much more lucrative ARM stock. Element 14 carried on with DSL technology, and a new company, RISCOS Ltd, licensed RISC OS from its eventual new owners. RISC OS 4 was released shortly after, and RISC OS 6 in 2006. RISC OS remains under development. (RISC OS 5 is a separate fork by Castle Technology.) If you fancy giving it a go, you can buy a RISC OS emulator USB stick for Windows, Mac, or Linux, from www.riscosopen.org, or RISC OS is also available for the Raspberry Pi.

Meanwhile, Sophie Wilson is still working for Broadcom (who bought out Element 14) and was the chief architect of their Firepath processor. She was awarded the Fellow Award by the Computer History Museum, California, in 2012, was elected as a Fellow of the Royal Society in 2013, and is considered one of the most important women in tech history. Think of her the next time you check your phone.

IP cores

A semiconductor IP (intellectual property) core is a chunk of chip or logic design that is the intellectual property of a particular party, usually a company. The chunks can be used as building blocks for larger chip or logic designs. They may be used only by that company or may be licensed out. The ability to license designs like this means that chip makers can use a standard set of processors and internal functions, and then focus on specific features or innovations of their particular chip. This has sped up development significantly since it became common in the 1990s. IP cores can be soft cores, described in a ‘high level’ hardware description language (and thus modifiable by the chip maker), or hard cores, described as a physical description (and thus not modifiable). ARM architectures are soft designs and are licensed and used in a huge range of systems. A major advantage of being an IP core company is that you don’t have to pay for the (very expensive) kit to fabricate your own chips.

From Linux Voice issue 8. Click here to subscribe for more top-quality Linux learning every month!

Show more