[PPT] - ARM Memory Owen Kaser, CS2253 Mostly corresponds to book Chapter 5. PowerPoint Presentation

SLIDE 1

ARM Memory

Owen Kaser, CS2253

Mostly corresponds to book Chapter 5.

SLIDE 2

Overview

Loads and Stores
Memory Maps
Register-Indirect Addressing
Post- and Pre-indexed Addressing

SLIDE 3

16 Registers is Not Enough

So far, the only places discussed for data are the

ARM's CPU registers

Most interesting programs need more data.
We need memory outside the CPU for our bulk

data storage.

Also, memory can contain pre-computed tables

(eg, of trig functions) that are never altered

For your toaster's software, the machine code

can be set at the factory. Fancy toaster: you can “flash” your toaster with improved software.

SLIDE 4

Loads and Stores

Recall that ARM is a “load/store” architecture. Cannot

directly do calculations on values in memory. Have to load them into a CPU register to use them as inputs.

Similarly, calculations put results into registers. Then you

can use a store instruction to put them into memory.

Loads and stores need to specify where in memory things

should go. This will be a numeric “memory address”.

(Memory) addressing modes are small built-in calculations

the CPU can do, to compute the memory address.

Simple case: value in, say, R3 is to be used as the

address.

SLIDE 5

System Memory Maps

A system built around an ARM7TDMI processor uses 32-bit

values as memory addresses. Each address would correspond to a byte (oops, octet).

The overall “memory address space” ranges from 0 to

0xFFFFFFFF.

But the overall memory address space is further subdivided

(boundaries are often small multiples of powers of 2)

RAM, ROM, flash, and I/O devices can be given their own

subdivisions.

More on I/O devices later in the course. For now, just

realize that some memory addresses accept stores, and some ignore them.

SLIDE 6

Ex. Memory Map

(extracts from book Table 5.1)

Start End Description 0x00000000 0x0003FFFF On-chip flash 0x00040000 0x00FFFFFF reserved 0x01000000 0x1FFFFFFF ROM 0x20000000 0x20007FFF (Static) RAM ….. 0x4000C000 0x4000CFFF UART 0 (a “serial port”) device ….. 0xE0001000 0xE0001FFF “data watchpoint and trace” (DWT) facility …. 0xE0004000 0xFFFFFFFF reserved

SLIDE 7

For Simplicity....

Let's only mess with addresses in a range that

corresponds to RAM memory.

Then, loads and stores both make sense.

SLIDE 8

Register-Indirect Addressing Mode

Let's suppose you want to load the byte at address

0x00005000 into register R3.

8 bit value into a 32-bit container. If we want the 8-bit

value to be zero-extended, use LDRB instruction.

If you want it sign-extended, use LDRSB.
Simplest case: a register stores the address of some

data you care about. Let's go for R1.

Assembler: MOV R1, #0x00005000 ;address to R1

LDRB R3, [R1] ; memory value to R3

SLIDE 9

Looping Through Memory

Let's suppose you want to wipe clear (to 0) the

contents of all memory locations from 0x00005000 to 0x00005FFF.

A loop will work nicely.

MOV R1, #0x00005000 ; starting location

MOV R2, #0x00006000; when to stop MOV R3, #0 LP STRB R3, [R1] ; wipe clear current location's value ADD R1, R1, #1 ; advance to next location TEQ R1, R2 ; has R1 hit the stopping location? BNE LP ….

SLIDE 10

Speeding It Up

If the area to be cleared is properly aligned

(starts on a multiple of 4) and is the right size (a multiple of 4) we can clear out 4 consecutive addresses with one STR (store word) instruction.

Recall that a 32-bit word is stored across 4

addresses: A, A+1, A+2, A+3.

SLIDE 11

Faster Code

MOV R1, #0x00005000 ; starting location MOV R2, #0x00006000; when to stop MOV R3, #0 ; 4 bytes of zeros LP STR R3, [R1] ; wipe clear current location's value AND the next 3 locations' values ADD R1, R1, #4 ; advance to location of next group of 4 bytes TEQ R1, R2 ; has R1 hit the stopping location? BNE LP

Loop runs only ¼ as many times now.

SLIDE 12

Even Faster

The pattern of “use a register to provide a memory

address, then update the register in preparation for the next loop” is extremely common.

ARM designers created an addressing mode that

does BOTH of these operations in a single

instruction. “post-indexed”
STR R3, [R1], #4 is equivalent to

STR R3, [R1] ADD R1, R1, #4

SLIDE 13

Textbook Figure 5.2

SLIDE 14

Even Faster Code

MOV R1, #0x00005000 ; starting location

MOV R2, #0x00006000; when to stop MOV R3, #0 ; 4 bytes of zeros LP STR R3, [R1], #4 ; wipe 4, then advance “pointer” R1 ADD R1, R1, #4 ; advance to location of next group TEQ R1, R2 ; has R1 hit the stopping location? BNE LP

SLIDE 15

Java Pre- vs Post-Increment

Can draw a parallel to Java's ++ operators.
Recall, v = M[ p++] in Java

– it uses the current version of p to index M – then it increments p. post-increment.

Versus v = M[++p] in Java

– it first increments p pre-increment – then then new value of p is used to index into M

SLIDE 16

Post-Indexed Addressing

In ARM, post-indexed indexing takes a base register.

(Should not be R15.)

Uses that base register's value to go to memory
Then updates the base register's value by a little

computation

– adding/subtracting a constant (earlier example) – adding/subtracting a register

which is allowed to be modified by the barrel shifter
can be shifted/rotated by a constant amount
can be shifted/rotated by a register amount
Usefulness of fanciest of these seems doubtful
LDR R1, [R2], ROR R3 ; is this useful???

SLIDE 17

Useful? Example

Java, for an int array M, variable x:

j = 0;

while (….) { sum += M[j]; j += x;}

ARM: suppose x in R2, start of M in R1
In loop body: LDR R3, [R1], R2 LSL #2

SLIDE 18

Pre-Indexed Addressing

There are two flavours of pre-indexed addressing.

Both do a little computation and use the computed effective address to go to memory. In one, the base register is updated. Other flavour does not update.

In assembly language, the ! symbol means to update

the base register. Don't use R15 as the base register with !

Ok to use R15, without ! The value of R15 is 8 bytes

beyond the start of the current machine code. [Details

f why are a bit advanced.]

SLIDE 19

Rationale for the “little computations”

PC-relative addressing for constants
Getting a field of an object, given the start of

the object.

Indexing into array of objects, selecting a field

(if the object size is a power of two)

(Selected largely by analyzing what compilers

for HLLs would find useful, I think...rather than focussing on assembly language programmers)

SLIDE 20

Pre-indexed Figure (Textbook)

Instruction is STR r0, [r1, #12]
Add ! to update r1 when finished:

STR r0, [r1, #12]! ; r0 ← x20c

SLIDE 21

Some Pre-indexed Examples

MOV R1, 0x123456578 fails. Constant is not a rotation of

an 8 bit value.

Instead, initialize a memory location with your constant.

Then use PC-relative addressing to load it.

LDR R1, myConst ; pseudo-op

… 1000 bytes later... myConst DCD 0x12345678

The LDR instruction is actually something like

LDR R1, [PC, #996] ; PC was already 8 ahead

996 is close enough to PC. Must be within 4 kiB.

SLIDE 22

Ex: Field Access for an Object

In HLLs, the fields of an object occupy consecutive

memory addresses (possibly with padding)

Let's suppose that an object starts at 1000. There

are two 32-bit fields, then a 16-bit halfword field that we want to load into R2.

Let's suppose that R1 contains the starting address
f the object.
Use LDRH R2, [R1, #8] ; immediate offset is 8

(Desired field starts 8 bytes later: gotta skip over first two words.)

(Minor point: LDRH requires offset ±256)

SLIDE 23

Ex: Array Access

Suppose R1 contains the starting address of an

array.

Suppose the array's elements are 4 bytes each
To load the wth array element, we want address

R1 + 4*w

Suppose value w is in R2
LDR R5, [R1, R2 , LSL #2] loads desired value.

SLIDE 24

No ADR Pseudo-op

The Crossware assembler does not seem to support ADR, which is used to

put an address into a register (that you will then use as a base register). For instance, summing values in array… MOV R0, #0 ; accumulate answer ADR R1, MyArr ; Keil pseudo-op ADR R2, AfterMyArr ; past last valid address LP LDR R3, [R1], #4 ADD R0, R0, R3 TEQ R1, R2 BNE LP ….. MyArr DCD 34, 23, 56, 78, 12345566, ……... AfterMyArr DCB 0

SLIDE 25

Instead of ADR

I

n s t e a d

f

A D R , y

u

s h

u

l d b e a b l e t

d
t

h e f

l

l

w

i n g : M O V R , # ; a c c u m u l a t e a n s w e r L D R R 1 , = M y A r r L D R R 2 , = A f t e r M y A r r ; p a s t l a s t v a l i d a d d r e s s L P L D R R 3 , [ R 1 ] , # 4 A D D R , R , R 3 T E Q R 1 , R 2 B N E L P … . . M y A r r D C D 3 4 , 2 3 , 5 6 , 7 8 , 1 2 3 4 5 5 6 6 , … … . . . A f t e r M y A r r D C D ; w a s t e d w

r

d , c

u

l d a v

i

d . . .

SLIDE 26

LDR As Pseudoinstruction

LDR Rx, =value works for any 32-bit value (address or

constant).

It sets aside space in a “constant pool” , preinitialized to
value. This constant pool is (by default) at the end of

the current AREA.

Then it generates machine code for a PC-relative LDR

into Rx from this preinitialized location.

Like a convenient DCD and LDR Rx, [PC, #something]
See textbook Chapter 6.

SLIDE 27

Machine-Code Formats LDR/STR/LDRB/STRB

From reference manual:

SLIDE 28

Meaning of Some Bits (Ref Man)

SLIDE 29

Exercise/Example

Determine machine code for

LDR R3, [R1], #4 and also STRB R3, [R1, R2, LSR #5]!

SLIDE 30

Load and Store Multiple

There are instructions LDM and STM that load
r store a number of registers.
With LDM, a bit vector in the machine code

indicates which register to load. They are loaded from consecutive addresses.

STM works similarly
They are especially useful in storing things on

the runtime stack, and will be looked at when we cover that topic.