ENGN2219/COMP6719 Computer Architecture & Simulation Ramesh - - PowerPoint PPT Presentation

engn2219 comp6719
SMART_READER_LITE
LIVE PREVIEW

ENGN2219/COMP6719 Computer Architecture & Simulation Ramesh - - PowerPoint PPT Presentation

ENGN2219/COMP6719 Computer Architecture & Simulation Ramesh Sankaranarayana Semester 1, 2020 (based on original material by Ben Swift and Uwe Zimmer) 1 Week 5: Functions 2 Royal Choral Society: 'Hallelujah Chorus' from Handel's Messiah


slide-1
SLIDE 1

ENGN2219/COMP6719

Computer Architecture & Simulation Ramesh Sankaranarayana Semester 1, 2020 (based on original material by Ben Swift and Uwe Zimmer)

1

slide-2
SLIDE 2

Week 5: Functions

2

slide-3
SLIDE 3

Royal Choral Society: 'Hallelujah Chorus' from Handel's Messiah Royal Choral Society: 'Hallelujah Chorus' from Handel's Messiah

Copy link Copy link

3

slide-4
SLIDE 4

Outline

why functions? calling conventions the stack

4

slide-5
SLIDE 5

Why functions?

copy-pasting sucks enables code reuse modifiability information hiding simplifies the design process single point of failure

5

slide-6
SLIDE 6

Function gallery

def plus_1(x) return x + 1 public String plusOne(int x) { return x + 1; } (define plus-1 (lambda (x) (+ x 1)))

6

slide-7
SLIDE 7

first, some analogies

7

slide-8
SLIDE 8

Good: pipe (input & output)

  • r “black box”

8

slide-9
SLIDE 9

Better: there, and back again

9

slide-10
SLIDE 10

f(a, b) = g(x)dx ∫ b

a

10

slide-11
SLIDE 11

Function as a black box

11

slide-12
SLIDE 12

A function call

12

slide-13
SLIDE 13

talk

Can we do this with branch (b)?

13

slide-14
SLIDE 14

Open questions

how does the program know where to come back to? how do we pass information (i.e. parameters) in? how do we get information (i.e. return values) back? can we have some “scribble paper”? note: parameters/arguments - different words for the same thing

14

slide-15
SLIDE 15

Remember ?

They try and leave a trail of breadcrumbs behind them so they can find their way back.

Hansel and Gretel

15

slide-16
SLIDE 16

bl: branch with link

When the branch with link instruction (bl) is executed, the address of the next instruction (i.e. the one after the bl instruction) is placed in a special register You’ve seen this already in assignment 1

16

slide-17
SLIDE 17

lr: the “link register”

Just like r15 (pc), r14 also has a special meaning—it’s the link register

17

slide-18
SLIDE 18

bx: branch and exchange

The lr might contain the address of the instruction we want to go back to, but how do we actually return there? The branch and exchange (bx) instruction branches not to a static label, but to an address in a register

18

slide-19
SLIDE 19

Don’t worry too much about the “exchange” part

19

slide-20
SLIDE 20

Putting it all together

20

slide-21
SLIDE 21

What about conditional branches?

Both of these new branch instructions (bl) and (bx) can’t be used conditionally (e.g. with an eq suffix) in the ARMv7-M ISA your discoboard uses There are ways around it, but they’re beyond the scope of this course but you don’t need it anyway, you can use regular conditional branch (e.g.

bgt)

21

slide-22
SLIDE 22

Function template

@ use the type directive to tell the assembler @ that fn_name is a function (optional) .type fn_name, %function fn_name: @ just a normal label @ @ the body of the function @ bx lr @ to go back

22

slide-23
SLIDE 23

Functions are simple

use a bl <label> to branch with link use a bx lr instruction to come back

23

slide-24
SLIDE 24

Live example: TV binging

24

slide-25
SLIDE 25

Nested functions

25

slide-26
SLIDE 26

did the breadcrumbs thing work for ? Hansel & Gretel

26

slide-27
SLIDE 27

Nested Plus_1 (broken!)

27

slide-28
SLIDE 28

talk

How can we stop the “first” return address getting clobbered? Sure, store it to memory, but where?

28

slide-29
SLIDE 29

Nested Plus_1 (fixed!)

this will work in this case, but there’s still a slight problem with the use of

sp here—can you spot it?

29

slide-30
SLIDE 30

The stack (sneak peek)

One final new register: the stack pointer (sp, but it’s actually r13) By convention: the value of the sp is an address in the SRAM region of the address space (like with the .data section) basically, it’s memory you can use to get things done We’ll return to the stack … later

30

slide-31
SLIDE 31

Lots of new terms here

You’ll write a lot of functions, so you’ll get lots of time to practice.

31

slide-32
SLIDE 32

Questions

32

slide-33
SLIDE 33

info

due monday of week 6 assignment 1 mid-semester exam to be decided

33

slide-34
SLIDE 34

Calling conventions

34

slide-35
SLIDE 35

Open questions

how does the program know where to come back to? how do we pass information (i.e. parameters) in? how do we get information (i.e. return values) back? can we have some “scribble paper”?

35

slide-36
SLIDE 36

We need a convention

an agreed-upon plan for where to find the input(s) and where to leave the result

36

slide-37
SLIDE 37

Calling convention definition

This is called a calling convention (CC) It’s a contract between the caller (the code which makes the function call with bl <label>) and the callee (the code between <label> and the bx

lr instruction)

37

slide-38
SLIDE 38

What does the CC specify?

where to look for the parameter values (the inputs) where to leave the outputs which registers to touch, which to leave alone

38

slide-39
SLIDE 39

talk

Which calling convention does this function use? trick question!

int do_all_the_things(int how_many_things){ // lies! does *none* of the things return 0; }

39

slide-40
SLIDE 40

There are many possible CCs

It doesn’t matter which calling convention you use (as we’ll see), as long as the caller and the callee use the same convention

40

slide-41
SLIDE 41

assume x is in r0…

41

slide-42
SLIDE 42

CC example

Do these two Plus_1 functions both give the right answer (i.e. x+1)? What’s the difference?

Plus_1: add r0, r0, 1 bx lr Plus_1: add r5, r2, 1 bx lr

42

slide-43
SLIDE 43

AAPCS

The is the convention we’ll (try to) adhere to in programming our discoboards The full standard is quite detailed, but the general summary is:

r0-r3 are the parameter and scratch registers r0-r1 are also the result registers r4-r11 are callee-save registers r12-r15 are special registers (ip, sp, lr, pc)

ARMv7 Architecture Procedure Call Standard

43

slide-44
SLIDE 44

What are scratch registers?

r0-r3 are “scratch” registers, which means that the callee can freely use

them (and not worry about messing anything up) These are also called “caller-save” registers, because if the caller wants to preserve the values in them they need to save them somewhere

44

slide-45
SLIDE 45

Different ways to get data in/out

Do these two Plus_1 functions both give the right answer (i.e. x+1)? What’s the difference?

@ pass by value Plus_1: add r0, 1 bx lr @ pass by reference Plus_1: ldr r3, [sp] add r3, 1 str r3, [sp] bx lr

45

slide-46
SLIDE 46

Pass-by-value vs pass-by-reference

Two different approaches to passing parameters and return values in and

  • ut of a function.

pass by value makes a “copy” (can mess with it without affecting the caller) pass by reference gives the callee access to the same bits as the caller pros and cons to both, depends on the nature of the things being passed in and out

46

slide-47
SLIDE 47

data needs to live in memory (registers are not for long-term storage)

47

slide-48
SLIDE 48

Simple functions fit on slides…

make sure you’re practicing!

48

slide-49
SLIDE 49

The stack

49

slide-50
SLIDE 50

Open questions

how does the program know where to come back to? how do we pass information (i.e. parameters) in? how do we get information (i.e. return values) back? can we have some “scribble paper”?

50

slide-51
SLIDE 51

What about local variables?

maybe put c, d and e in more registers?

function doStuff(a, b){ let c = a+b; let d = a-b; let e = a*b; // function body here }

51

slide-52
SLIDE 52

What about local variables?

there aren’t enough registers this time

function doArrayStuff(a, b){ let person = { name: "Esmerelda", age: 54, pets: ["rex", "daisy"] }; let junk = new Array(1000); // function body here }

52

slide-53
SLIDE 53

The stack pointer (revisited)

The stack pointer (sp) contains a memory address, and this can be used by functions for various purposes: “saving” values in registers which would otherwise be overwritten (e.g.

lr)

passing parameters/returning values temporary variables, e.g. “scribble paper” It’s called the stack because (in general) it’s used like a first-in-last-out (FILO) with two main operations: push a value on to the stack, and pop a value off the stack stack “data structure”

53

slide-54
SLIDE 54

but only if you follow the rules

54

slide-55
SLIDE 55

Setting up the stack

Look at the first instruction executed in the startup file: Loads a value (_estack) into sp using the The exact value of _estack comes from the (line 34):

ldr sp, =_estack

ldr pseudo-instruction linker file

/* Highest address of the user mode stack */ _estack = 0x20018000; /* end of RAM */

55

slide-56
SLIDE 56

Stack pointer in memory

56

slide-57
SLIDE 57

More about the stack pointer

the value (remember, it’s a memory address) in sp changes as your program runs

sp can either point to the last “used” address used (full stack) or the

first “unused” one (empty stack) you (usually) don’t care about the absolute sp address, because you use it primarily for offset (or relative) addressing stack can “grow” up (ascending stack) or down (descending stack) in ARM Cortex-M (e.g. your discoboard) the convention is to use a full descending stack starting at the highest address in the address space

57

slide-58
SLIDE 58

Using the stack

Just use sp like any other register containing a memory address:

mov r2, 0xfe @ push the value in r2 onto the stack str r2, [sp, -4] sub sp, sp, 4 @ do some stuff here @ pop the value from the "top" of the stack into r3 ldr r3, [sp] add sp, sp, 4

58

slide-59
SLIDE 59

Push, illustrated

59

slide-60
SLIDE 60

Pop, illustrated

60

slide-61
SLIDE 61

the “missing” values in the diagrams aren’t empty, just unknown

61

slide-62
SLIDE 62

Offset load and store with write-back

ldr/str with offset can write the new address (base + offset) back to the

address register (in this case r1) in two different ways pre-offset: update the index register before doing the store (or load) post-offset: update the index register after doing the load (or store)

@ r1 := r1 + 4 str r0, [r1, 4]! @ note the "!" @ r1 := r1 - 8 ldr r0, [r1], -8 @ no "!" for post-offset

62

slide-63
SLIDE 63

Pre-offset addressing

63

slide-64
SLIDE 64

Post-offset addressing

64

slide-65
SLIDE 65

Stack pointer example (again)

Pre/post offset addressing means fewer instructions

mov r2, 0xbc @ push str r2, [sp, -4]! @ do stuff... @pop ldr r3, [sp], 4

65

slide-66
SLIDE 66

push and pop instructions

Doing this with the stack pointer (sp) as the base address is so common that the ISA even has specific push and pop instructions note that the sp base address is implicit

mov r2, 0xfe @ gives same result as `str r2, [sp, -4]!` push {r2} @ do stuff... @ gives same result as `ldr r3, [sp], 4` pop {r3}

66

slide-67
SLIDE 67

Register list syntax

There was one other difference in the push and pop syntax: the brace ({

}) syntax around the register name

Certain instructions take register lists—they can apply to multiple registers at once, e.g.

@ push r0, r1, r2, r9 to stack, decrement sp by 4*4=16 push {r0-r2,r9} @ pop 4 words from the stack into r0, r1, r2, r9 pop {r0-r2,r9}

67

slide-68
SLIDE 68

push instruction encoding

from A7.7.99 of the reference manual

68

slide-69
SLIDE 69

Load/store multiple

There are also instructions for loading/storing multiple words using any register as the base register

ldmdb load multiple, decrement before ldmia load multiple, increment after stmdb store multiple, decrement before stmia store multiple, increment after

But if sp is the base address, then push and pop are probably easier to read be careful about the order!

69

slide-70
SLIDE 70

Further reading

http://www.davespace.co.uk/arm/introduction-to-arm/stack.html

70

slide-71
SLIDE 71

Function prologue & epilogue

The beginning (or prologue) of a function should: store (to the stack) lr and any other values (e.g. parameters) in registers which will clobbered during the execution of the function (remember the ) make room for any temporary variables by decreasing the stack pointer The end (or epilogue) of a function should: AAPCS

71

slide-72
SLIDE 72

Share house kitchen

72

slide-73
SLIDE 73

Function prologue & epilogue example

.type my_func, %function @ assume three parameters in r0-r2 my_func: @ prologue push {r0-r2} @ sp decreases by 12 push {lr} @ sp decreases by 4 @ body: do stuff, leave "return value" in r3 @ epilogue mov r0, r3 @ leave return value in the right place pop {lr} @ sp increases by 4 add sp, sp, 12 @ balance out the initial "push" bx lr

73

slide-74
SLIDE 74

Function stack frame

74

slide-75
SLIDE 75

Nested function calls

  • uter_fn:

push {r0,lr} bl middle_fn pop {r0,lr} bx lr middle_fn: push {r0,lr} bl inner_fn pop {r0,lr} bx lr inner_fn: @ do inner function stuff bx lr

75

slide-76
SLIDE 76

Nested stack frames

76

slide-77
SLIDE 77

the sp “zippers” up and down as the program executes

77

slide-78
SLIDE 78

There’s lots more to say…

there’s more you can put in your stack frame (e.g. frame pointer fp) ARMv7/AAPCS is pretty register-heavy (other ISA/CCs use the stack more, e.g. for parameter passing and return addresses) an optimizing compiler will almost certainly not generate the code you expect recursion is an interesting case, which we will now look at shortly

78

slide-79
SLIDE 79

These are all conventions

It’s the programmer’s job to adhere to them: the operating systems programmer, the compiler programmer, the library programmer, the application programmer, … For bare-metal assembly programming, you’re all of those

79

slide-80
SLIDE 80

Recursion

A function is recursive if it calls itself from the body of the function. A simple example is: What does this do?

f(n) = { 1 nf(n − 1) if n = 0

  • therwise

80

slide-81
SLIDE 81

Three characteristics of a recursive function:

  • 1. There must be a terminating condition
  • 2. The function must call a clone of itself
  • 3. The parameters to that call must move the function towards the

terminating condition

81

slide-82
SLIDE 82

@ number passed in r0, result in r1 factorial: push {r0,lr} cmp r0, #0 bne recurse @ what's the potential problem here? mov r1, #1 jmp end recurse: sub r0, #1 bl factorial add r0, #1 mul r1, r0 end: pop {r0,lr} bx lr

82

slide-83
SLIDE 83

Some observations about recursion: Recursion incurs a memory overhead (stack frames in the call stack) with each recursive call If values need to be passed back once the recursion terminates, then the resulting unwinding of the call stack and intermediate calculation

  • f values can be expensive

Compilers in high level languages can optimize recursive calls so that they run faster In many cases, it can be re-written as an interative solution, which is generally faster

83

slide-84
SLIDE 84

Questions?

84

slide-85
SLIDE 85

Fibonacci Sequence

Write a function that generates a Fibonacci sequence of a certain length. The length and the starting address in memory where the sequence is to be stored are input as arguments to the function. Call the function for different values of length and check that it works correctly.

85

slide-86
SLIDE 86

Area

Write a function that computes the area of a triangle, given its base and height. Now, write a function that uses the above function to compute the area of a square, given the length of one of its sides.

86

slide-87
SLIDE 87

Velocity

Write a function that computes the final velocity of an object, given its initial velocity, acceleration and elapsed time.

87

slide-88
SLIDE 88

Assembler Macros

88

slide-89
SLIDE 89

Outline

Godbolt compiler explorer assembly macros

89

slide-90
SLIDE 90

Godbolt compiler explorer

: a super-cool interactive resource for exploring stack frames (and code generation in general) A few tips: in the compiler select dropdown, select one of the ARM gcc options in the Compiler options… box, try -O0 (unoptimised) vs -O3 (optimised) try modifying the C code on the left; see how the asm output on the right changes remember the ! https://godbolt.org/ stack frames

90

slide-91
SLIDE 91

Macros are for automatically copy-pasting code

91

slide-92
SLIDE 92

Like this…

92

slide-93
SLIDE 93

as macro language

The macro language is defined by the (as) Two steps: define a macro (with .macro/.endm) call/use a macro (using the name of the macro) The assembler copy-pastes the macro code (replacing parameters where present) into your program before generating the machine code assembler

93

slide-94
SLIDE 94

General macro syntax

.macro macro_name arg_a arg_b ... @ to use the argument, prefix with "\" @ e.g. adds r0, \arg_a, \arg_b @ ... .endm

94

slide-95
SLIDE 95

Example: swap

@ swap the values in two registers @ assumes r12 is free to use as a "scratch" register .macro swap reg_a reg_b mov r12, \reg_a mov \reg_a, \reg_b mov \reg_b, r12 .endm

95

slide-96
SLIDE 96

Calling the swap macro

If you use swap in your assembly code the assembler sees it an “expands” it to it’s exactly like you had used this code in your main.S file in the first place

swap r0, r3 mov r12, r0 mov r0, r3 mov r3, r12

96

slide-97
SLIDE 97

the CPU doesn’t know anything about your macros

97

slide-98
SLIDE 98

Recap: if statement

Remember the best if statement

if: @ set flags here b<c> then @ else b rest_of_program then: @ instruction(s) here rest_of_program: @ continue on...

98

slide-99
SLIDE 99

An if macro

.macro if condition_code condition then_code else_code \condition_code b\condition then \else_code b end_if then: \then_code end_if: .endm @ usage if "cmp r1, r2", eq, "mov r3, 1", "mov r3, 0"

99

slide-100
SLIDE 100

Things to note

Macros can “splice” parameters into the middle of instructions, e.g.

b\condition becomes e.g. beq or blt

Whole instructions can be treated as a single macro parameter (e.g. "cmp

r1, r2" as the condition_code parameter) as long as they’re surrounded

by double quotes (") This is a blessing and a curse!

100

slide-101
SLIDE 101

The \@ macro “counter” variable

The \@ variable contains a counter of how many macros executed so far which you can use in your macro output

.macro if condition_code condition then_code else_code \condition_code b\condition then\@ \else_code b end_if\@ then\@: \then_code end_if\@: .endm

101

slide-102
SLIDE 102

A basic for macro

.macro for register from to body mov \register, \from for\@: cmp \register, \to bgt end_for\@ \body add \register, 1 b for\@ end_for\@: .endm @ usage for r1, 1, 100 "add r3, r1"

102

slide-103
SLIDE 103

Advanced macro syntax

  • ptional parameters (arg1=500)

check if parameters are present (.ifb) conditionals (.if) and loops (.loops) macros can be recursive Read the docs

103

slide-104
SLIDE 104

Macro gotchas

hard to debug (can’t step through) need to be careful with names (e.g. clashing labels) for as parameters, use \() as a separator, e.g. \labelname\(): (it gets removed, but stops the assembler thinking the : is part of

labelname)

they might generate a lot of instructions the documentation kindof sucks labels

104

slide-105
SLIDE 105

Debugging with the disassembler

If you really need to see what instructions your macro is generating, use the disassembler Don’t forget the .type <func_name>, %function and .size

<func_name>, .-<func_name> directives

105

slide-106
SLIDE 106

they look like functions in a higher-level language—don’t be fooled

106

slide-107
SLIDE 107

talk

How would you explain the difference between functions and macros to your Grandma?

107

slide-108
SLIDE 108

Further reading

  • n

community.arm.com

.macro as directive docs

Useful assembler directives and macros for the GNU assembler

108

slide-109
SLIDE 109

Test and branch

Write a test and branch macro whose signature is: For example, it could be called as follows: This would translate to the following code in assembly:

testandbranch destination_label register_to_be_tested condition testandbranch NonZero, r0, NE CMP r0, #0 BNE NonZero

109

slide-110
SLIDE 110

While macro

Similar to the FOR macro, write a WHILE macro. What would be the macro signature? Would you use the macro counter variable? Now, change this to a DO_WHILE macro.

110

slide-111
SLIDE 111

Macros by request

111

slide-112
SLIDE 112

Questions?

112