ARM Memory Owen Kaser, CS2253 Mostly corresponds to book Chapter 5. - - PowerPoint PPT Presentation

arm memory
SMART_READER_LITE
LIVE PREVIEW

ARM Memory Owen Kaser, CS2253 Mostly corresponds to book Chapter 5. - - PowerPoint PPT Presentation

ARM Memory Owen Kaser, CS2253 Mostly corresponds to book Chapter 5. Overview Loads and Stores Memory Maps Register-Indirect Addressing Post- and Pre-indexed Addressing 16 Registers is Not Enough So far, the only places


slide-1
SLIDE 1

ARM Memory

Owen Kaser, CS2253

Mostly corresponds to book Chapter 5.

slide-2
SLIDE 2

Overview

  • Loads and Stores
  • Memory Maps
  • Register-Indirect Addressing
  • Post- and Pre-indexed Addressing
slide-3
SLIDE 3

16 Registers is Not Enough

  • So far, the only places discussed for data are the

ARM's CPU registers

  • Most interesting programs need more data.
  • We need memory outside the CPU for our bulk

data storage.

  • Also, memory can contain pre-computed tables

(eg, of trig functions) that are never altered

  • For your toaster's software, the machine code

can be set at the factory. Fancy toaster: you can “flash” your toaster with improved software.

slide-4
SLIDE 4

Loads and Stores

  • Recall that ARM is a “load/store” architecture. Cannot

directly do calculations on values in memory. Have to load them into a CPU register to use them as inputs.

  • Similarly, calculations put results into registers. Then you

can use a store instruction to put them into memory.

  • Loads and stores need to specify where in memory things

should go. This will be a numeric “memory address”.

  • (Memory) addressing modes are small built-in calculations

the CPU can do, to compute the memory address.

  • Simple case: value in, say, R3 is to be used as the

address.

slide-5
SLIDE 5

System Memory Maps

  • A system built around an ARM7TDMI processor uses 32-bit

values as memory addresses. Each address would correspond to a byte (oops, octet).

  • The overall “memory address space” ranges from 0 to

0xFFFFFFFF.

  • But the overall memory address space is further subdivided

(boundaries are often small multiples of powers of 2)

  • RAM, ROM, flash, and I/O devices can be given their own

subdivisions.

  • More on I/O devices later in the course. For now, just

realize that some memory addresses accept stores, and some ignore them.

slide-6
SLIDE 6
  • Ex. Memory Map

(extracts from book Table 5.1)

Start End Description 0x00000000 0x0003FFFF On-chip flash 0x00040000 0x00FFFFFF reserved 0x01000000 0x1FFFFFFF ROM 0x20000000 0x20007FFF (Static) RAM ….. 0x4000C000 0x4000CFFF UART 0 (a “serial port”) device ….. 0xE0001000 0xE0001FFF “data watchpoint and trace” (DWT) facility …. 0xE0004000 0xFFFFFFFF reserved

slide-7
SLIDE 7

For Simplicity....

  • Let's only mess with addresses in a range that

corresponds to RAM memory.

  • Then, loads and stores both make sense.
slide-8
SLIDE 8

Register-Indirect Addressing Mode

  • Let's suppose you want to load the byte at address

0x00005000 into register R3.

  • 8 bit value into a 32-bit container. If we want the 8-bit

value to be zero-extended, use LDRB instruction.

  • If you want it sign-extended, use LDRSB.
  • Simplest case: a register stores the address of some

data you care about. Let's go for R1.

  • Assembler: MOV R1, #0x00005000 ;address to R1

LDRB R3, [R1] ; memory value to R3

slide-9
SLIDE 9

Looping Through Memory

  • Let's suppose you want to wipe clear (to 0) the

contents of all memory locations from 0x00005000 to 0x00005FFF.

  • A loop will work nicely.

MOV R1, #0x00005000 ; starting location

MOV R2, #0x00006000; when to stop MOV R3, #0 LP STRB R3, [R1] ; wipe clear current location's value ADD R1, R1, #1 ; advance to next location TEQ R1, R2 ; has R1 hit the stopping location? BNE LP ….

slide-10
SLIDE 10

Speeding It Up

  • If the area to be cleared is properly aligned

(starts on a multiple of 4) and is the right size (a multiple of 4) we can clear out 4 consecutive addresses with one STR (store word) instruction.

  • Recall that a 32-bit word is stored across 4

addresses: A, A+1, A+2, A+3.

slide-11
SLIDE 11

Faster Code

MOV R1, #0x00005000 ; starting location MOV R2, #0x00006000; when to stop MOV R3, #0 ; 4 bytes of zeros LP STR R3, [R1] ; wipe clear current location's value AND the next 3 locations' values ADD R1, R1, #4 ; advance to location of next group of 4 bytes TEQ R1, R2 ; has R1 hit the stopping location? BNE LP

  • Loop runs only ¼ as many times now.
slide-12
SLIDE 12

Even Faster

  • The pattern of “use a register to provide a memory

address, then update the register in preparation for the next loop” is extremely common.

  • ARM designers created an addressing mode that

does BOTH of these operations in a single

  • instruction. “post-indexed”
  • STR R3, [R1], #4 is equivalent to

STR R3, [R1] ADD R1, R1, #4

slide-13
SLIDE 13

Textbook Figure 5.2

slide-14
SLIDE 14

Even Faster Code

MOV R1, #0x00005000 ; starting location

MOV R2, #0x00006000; when to stop MOV R3, #0 ; 4 bytes of zeros LP STR R3, [R1], #4 ; wipe 4, then advance “pointer” R1 ADD R1, R1, #4 ; advance to location of next group TEQ R1, R2 ; has R1 hit the stopping location? BNE LP

slide-15
SLIDE 15

Java Pre- vs Post-Increment

  • Can draw a parallel to Java's ++ operators.
  • Recall, v = M[ p++] in Java

– it uses the current version of p to index M – then it increments p. post-increment.

  • Versus v = M[++p] in Java

– it first increments p pre-increment – then then new value of p is used to index into M

slide-16
SLIDE 16

Post-Indexed Addressing

  • In ARM, post-indexed indexing takes a base register.

(Should not be R15.)

  • Uses that base register's value to go to memory
  • Then updates the base register's value by a little

computation

– adding/subtracting a constant (earlier example) – adding/subtracting a register

  • which is allowed to be modified by the barrel shifter
  • can be shifted/rotated by a constant amount
  • can be shifted/rotated by a register amount
  • Usefulness of fanciest of these seems doubtful
  • LDR R1, [R2], ROR R3 ; is this useful???
slide-17
SLIDE 17

Useful? Example

  • Java, for an int array M, variable x:

j = 0;

while (….) { sum += M[j]; j += x;}

  • ARM: suppose x in R2, start of M in R1
  • In loop body: LDR R3, [R1], R2 LSL #2
slide-18
SLIDE 18

Pre-Indexed Addressing

  • There are two flavours of pre-indexed addressing.

Both do a little computation and use the computed effective address to go to memory. In one, the base register is updated. Other flavour does not update.

  • In assembly language, the ! symbol means to update

the base register. Don't use R15 as the base register with !

  • Ok to use R15, without ! The value of R15 is 8 bytes

beyond the start of the current machine code. [Details

  • f why are a bit advanced.]
slide-19
SLIDE 19

Rationale for the “little computations”

  • PC-relative addressing for constants
  • Getting a field of an object, given the start of

the object.

  • Indexing into array of objects, selecting a field

(if the object size is a power of two)

  • (Selected largely by analyzing what compilers

for HLLs would find useful, I think...rather than focussing on assembly language programmers)

slide-20
SLIDE 20

Pre-indexed Figure (Textbook)

  • Instruction is STR r0, [r1, #12]
  • Add ! to update r1 when finished:

STR r0, [r1, #12]! ; r0 ← x20c

slide-21
SLIDE 21

Some Pre-indexed Examples

  • MOV R1, 0x123456578 fails. Constant is not a rotation of

an 8 bit value.

  • Instead, initialize a memory location with your constant.

Then use PC-relative addressing to load it.

  • LDR R1, myConst ; pseudo-op

… 1000 bytes later... myConst DCD 0x12345678

  • The LDR instruction is actually something like

LDR R1, [PC, #996] ; PC was already 8 ahead

  • 996 is close enough to PC. Must be within 4 kiB.
slide-22
SLIDE 22

Ex: Field Access for an Object

  • In HLLs, the fields of an object occupy consecutive

memory addresses (possibly with padding)

  • Let's suppose that an object starts at 1000. There

are two 32-bit fields, then a 16-bit halfword field that we want to load into R2.

  • Let's suppose that R1 contains the starting address
  • f the object.
  • Use LDRH R2, [R1, #8] ; immediate offset is 8

(Desired field starts 8 bytes later: gotta skip over first two words.)

  • (Minor point: LDRH requires offset ±256)
slide-23
SLIDE 23

Ex: Array Access

  • Suppose R1 contains the starting address of an

array.

  • Suppose the array's elements are 4 bytes each
  • To load the wth array element, we want address

R1 + 4*w

  • Suppose value w is in R2
  • LDR R5, [R1, R2 , LSL #2] loads desired value.
slide-24
SLIDE 24

No ADR Pseudo-op

  • The Crossware assembler does not seem to support ADR, which is used to

put an address into a register (that you will then use as a base register). For instance, summing values in array… MOV R0, #0 ; accumulate answer ADR R1, MyArr ; Keil pseudo-op ADR R2, AfterMyArr ; past last valid address LP LDR R3, [R1], #4 ADD R0, R0, R3 TEQ R1, R2 BNE LP ….. MyArr DCD 34, 23, 56, 78, 12345566, ……... AfterMyArr DCB 0

slide-25
SLIDE 25

Instead of ADR

  • I

n s t e a d

  • f

A D R , y

  • u

s h

  • u

l d b e a b l e t

  • d
  • t

h e f

  • l

l

  • w

i n g : M O V R , # ; a c c u m u l a t e a n s w e r L D R R 1 , = M y A r r L D R R 2 , = A f t e r M y A r r ; p a s t l a s t v a l i d a d d r e s s L P L D R R 3 , [ R 1 ] , # 4 A D D R , R , R 3 T E Q R 1 , R 2 B N E L P … . . M y A r r D C D 3 4 , 2 3 , 5 6 , 7 8 , 1 2 3 4 5 5 6 6 , … … . . . A f t e r M y A r r D C D ; w a s t e d w

  • r

d , c

  • u

l d a v

  • i

d . . .

slide-26
SLIDE 26

LDR As Pseudoinstruction

  • LDR Rx, =value works for any 32-bit value (address or

constant).

  • It sets aside space in a “constant pool” , preinitialized to
  • value. This constant pool is (by default) at the end of

the current AREA.

  • Then it generates machine code for a PC-relative LDR

into Rx from this preinitialized location.

  • Like a convenient DCD and LDR Rx, [PC, #something]
  • See textbook Chapter 6.
slide-27
SLIDE 27

Machine-Code Formats LDR/STR/LDRB/STRB

  • From reference manual:
slide-28
SLIDE 28

Meaning of Some Bits (Ref Man)

slide-29
SLIDE 29

Exercise/Example

  • Determine machine code for

LDR R3, [R1], #4 and also STRB R3, [R1, R2, LSR #5]!

slide-30
SLIDE 30

Load and Store Multiple

  • There are instructions LDM and STM that load
  • r store a number of registers.
  • With LDM, a bit vector in the machine code

indicates which register to load. They are loaded from consecutive addresses.

  • STM works similarly
  • They are especially useful in storing things on

the runtime stack, and will be looked at when we cover that topic.