IC220 a = function2(b, c, d); SlideSet #3: Procedures & } - PowerPoint PPT Presentation

Procedure Example & Terminology void function1() { int a, b, c, d; IC220 … a = function2(b, c, d); SlideSet #3: Procedures & … } Instruction Representation (Sections 2.8, 2.5, 2.10) int function2(int s, int t, int u) { int x, y, z; … return x; } Big Picture – Steps for Executing a Procedure Step #1: Placement of Parameters 1. Place parameters where the callee procedure can access them • Assigned Registers: • If more than eight are needed? 2. Transfer control to the callee procedure 3. (Maybe) Acquire the storage resources needed for the callee procedure • Parameters are not “saved” across procedure call 4. Callee performs the desired task 5. Place the result somewhere that the “caller” procedure can access it 6. Return control to the point of origin (in caller)

Step #2: Transfer Control to the Procedure Step #3: Acquire storage resources needed by callee • bl – • Suppose callee wants to use registers x20, x21 and x22 – Branches to the procedure address AND links to return address – But caller still expects them to have same value after the call – Solution: • Link saved in register _____ – What exactly is saved? • At start of function call: subi _____,_____, ____ // – Why do we need this? stur x20, [sp, ____ ] // stur x21, [sp, ____ ] // Allows procedure to be called at __________ points in stur x22, [sp, ____ ] // code, _________ times, each having a _________ return address • WARNING: unlike book examples, must move sp in increments of 16 Step #3 Storage Continued Step #4: Callee Execution • Use parameters from _________________ and _________________ (setup by caller) • Temporary storage locations to use for computation: 1. Temporary registers (x9-x15) 2. Argument registers (x0-x7) if… Contents of register Contents of register 3. Other registers Contents of register but… 4. What if still need more?

Step #5: Place result where caller can get it Step #6: Return control to caller – Part A • Part I – Restore appropriate registers before returning from the procedure • Placement of Result (return value) – ldur x22, [sp, #0] // restore x22 for caller – Must place result in appropriate register(s) – ldur x21, [sp, #8] // restore x21 for caller • 64-bit result: – ldur x20, [sp, #16] // restore x20 for caller • More than 64-bit result : – addi sp, sp, ______ // adjust stack to delete our 3 items • Example: desired result currently in x5, then: • Part II – Return to proper location in caller (return address) – Jump to stored address of next instruction after procedure call – Explicit instruction: – Often written with this shorthand (not 100% the same, but almost): EX: 2-31 to 2-32 Recap – Steps for Executing a Procedure Example – putting it all together • Write assembly for the following procedure 1. Place parameters where the callee procedure can access them long dog (long m, long n) { 2. Transfer control to the callee procedure long result = m + n + 7; return result; } 3. (Maybe) Acquire the storage resources needed for the callee procedure 4. Callee performs the desired task • Call this function to compute dog(5, 10): 5. Place the result somewhere that the “caller” procedure can access it 6. Return control to the point of origin (in caller)

Nested Procedures Register Conventions • Register Convention – for “Preserved on Call” registers (like X20): 1. If used, the callee must store and return values for these registers • What if the callee wants to call another procedure – any problems? 2. If not used, not saved • Solution? • This also applies to recursive procedures Nested Procedures Example – putting it all together (again) • “Activation record” – part of stack holding procedures saved values and local variables • Write assembly for the following procedure • FP – points to first word of activation record for procedure (not required; we don’t use it) long cloak (long n) { if (n == 0) return 100; else return (n*n + dagger(n-1)); } • Call this function to compute cloak(6):

long cloak (long n) { Example – putting it all together What does that function do? if (n == 0) return 100; else return (n*n + dagger(n-1)); } cloak: sub sp, sp, #16 stur lr, [sp, #0] stur x20, [sp, #8] mov x20, x0 long cloak (long n) { cbnz x0, Else if (n == 0) return 100; mov x0, 100 else return (n*n + dagger(n-1)); b cloakExit } Else: sub x0, x0, #1 bl dagger mul x1, x20, x20 add x0, x0, x1 cloakExit: ldur x20, [sp, #8] ldur lr, [sp, #0] add sp, sp, #16 br lr .size cloak, . - cloak EX: 2-36 to 2-38 Best practices for nested functions myFunction: subi sp, sp, #16 // 16 or more, always multiple of 16 stur lr, [sp, #0] // do this first! // save ‘preserved’ registers here (if using any) // possibly save arguments here // (recommended: in ‘preserved’ register. Alternate: onto stack) // body of function (possibly with branches to ‘myFunctionExit’ ... myFunctionExit: // reload ‘preserved’ registers’ (if used any) ldur lr, [sp, #0] addi sp, sp, #16 // must match subi at start! br lr // recommended: ONE location where you return // Tell debugger where function this function ends! (IC220 requirement) .size myFunction, . - myFunction

Why/how gdb? squares: sub sp, sp, #16 stur lr, [sp, #0] stur x20, [sp, #8] mov x20, x0 cbnz x0, Else mov x0, 100 b squaresExit Else: sub x0, x0, #1 bl squares mul x1, x20, x20 add x0, x0, x1 squaresExit: ldur x20, [sp, #8] ldur lr, [sp, #0] add sp, sp, #16 br lr .size squares, . - squares Representing Instructions Representing Instructions • Assembly language provides convenient symbolic representation • How does the CPU actually know what to do? – Much easier than writing down numbers – Simplifications, e.g., destination first • Machine language is the underlying reality • What tradeoffs do we have to make? – Each instruction is simply a number, the “machine code” – Realities, e.g., destination is no longer first • Why does this work mov x0, 57 • LEGv8/ARVv8 instructions but not this – Always encoded as 32-bit instruction word mov x0, 57000 – Small number of formats provide all the details ???? – Registers assigned a 5 bit number – Regularity!

For reference: hexadecimal review LEGv8 R-format Instructions • R-format: • Base 16 – For instructions involving 3 registers – Compact representation of bit strings – add, adds, subs, and, orr, … – 4 bits per hex digit – Example: add x9, x21, x3 – Often indicated by 0x prefix 0 0000 4 0100 8 1000 c 1100 1 0001 5 0101 9 1001 d 1101 2 0010 6 0110 a 1010 e 1110 3 0011 7 0111 b 1011 f 1111  Example: 0xeca8 6420 = 0x458 3 0 21 9 Opcode Rm Shamt Rn Rd  1110 1100 1010 1000 0110 0100 0010 0000 11 bits 5 bits 6 bits 5 bits 5 bits LEGv8 D-format Instructions LEGv8 I-format Instructions • Load/store instructions (data transfer) Opcode Immediate Rn Rd – Rt: destination (load) or source (store) register number 10 bits 12 bits 5 bits 5 bits – Example: ldur x9, [x10, #240] • Immediate instructions ( addi, subi, andi, orri, ..) – Immediate = “constant” – Rn: source register – Rd: destination register 0x7c2 0 • Limitations on immediate field Opcode DT_Address Op2 Rn Rt – Immediate field is zero-extended 11 bits 9 bits 2 bits 5 bits 5 bits – Immediate field has 12 bits • Design Principle 3: Good design demands good compromises – Different formats complicate decoding, but allow 32-bit (somewhat obscure detail: actual ARMv8 limits are different because of option instructions uniformly to shift the constant by exactly 12 bits) – Keep formats as similar as possible

EX: 2-61 to 2-68 Bigger Constants LEGv8 Instruction Formats Summary • Most constants are small – 12-bit immediate is sufficient • For the occasional bigger constant MOVZ: move wide with zeros MOVK: move wide with keep • Both accept 16 bit immediates • And, use with “flexible 2nd operand” (shift) MOVZ X9,255,LSL 16 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 0000 0000 0000 0000 • NOTE: this figure (from book) shows all values as DECIMAL, but “green sheet” gives opcodes in HEX (easier to work with) MOVK X9,255,LSL 0 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 0000 0000 1111 1111 Branch Addressing LEGv8 Addressing Summary • B-type – B 10000 // go to location PC+10000 5 10000 ten 6 bits 26 bits • CB-type – CBNZ X19, Exit // go to Exit if X19 != 0 181 Exit 19 5 bits 8 bits 19 bits • Both addresses are PC-relative – Address = PC + offset (from instruction)

LEGv8 Encoding Summary

IC220 a = function2(b, c, d); SlideSet #3: Procedures & } - PowerPoint PPT Presentation

Procedure Example & Terminology void function1() { int a, b, c, d; IC220 a = function2(b, c, d); SlideSet #3: Procedures & } Instruction Representation (Sections 2.8, 2.5, 2.10) int function2(int s, int t, int u) { int x,

IC220 See through the marketing hype Slide Set #5B: Performance Key to understanding

IC220 a = function2(b, c, d); SlideSet #3: Procedures & } Instruction

IC220 SlideSet #4: Procedures (Chapter 2 finale) Stack Example Procedure Example &

A B A B A B A B = + + = IC220 Slide Set #7: Digital Logic (more Appendix C)

IC220 Combinational Logic Slide Set #A2: Combinational and Multiplexors (mux)

Real World Example Buzzer Feature for a Car Should Buzz when IC220 1. the

ADMIN Reading finish Chapter 5 Sections 5.4 (skip 511-515), 5.5, 5.11, 5.12 IC220

IC220: Set #13: Building a real processor! ( Chapter 5) 1 The Processor: Datapath & Control

ADMIN Course paper topics due Mon Feb 26 via plain text email IC220 Set #10: More

IC220 SlideSet #4: Procedures & Chapter 2 Finale (Sections 2.8) Stack Example Procedure

IC220 Anonymous Feedback for Credit! Due: Wed, April 2, 2008 beginning of class Please fill out

IC220: Caching 1 (Chapter 5) 1 Memory, Cost, and Performance Ideal World: we want a memory

IC220 0010 (multiplicand) __x_1011 (multiplier) SlideSet #5: More Arithmetic, Floating Point,

IC220 Set #7: Controlling the Single Cycle Implementation (Chapter Four) 1 Control Selecting

IC220 Read 3.6 (Floating point skim details on addition, multiplication, rounding,

Outline IC220 Computer Architecture and Class Survey / Role Call Organization What is:

Outline IC220 Computer Architecture and Class Survey / Role Call Organization What is:

IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - specifically 5.7, 5.8) 1 Cache design

IC220 Slide Set #6: Digital Logic (Appendix B) 1 2 Appendix Goals Logic Design Digital

IC220 Gates Basic building blocks of logic Slide Set #A1: Digital Logic Combinational

ADMIN Read pages 211-215 (MIPS floating point instructions) Read 3.9 IC220 Set #10:

Big Picture Interrupts Processor IC220 Set #11: Cache Storage and I/O Memory- I/O bus Main

Big Picture Interrupts Processor Cache IC220 Set #18: Storage and I/O Memory- I/O bus Main

IC220 MIPS conditional branch instructions (I type): SlideSet #3: Control Flow bne $t0,

IC220 a = function2(b, c, d); SlideSet #3: Procedures & } - PowerPoint PPT Presentation

Procedure Example & Terminology void function1() { int a, b, c, d; IC220 a = function2(b, c, d); SlideSet #3: Procedures & } Instruction Representation (Sections 2.8, 2.5, 2.10) int function2(int s, int t, int u) { int x,

IC220 See through the marketing hype Slide Set #5B: Performance Key to understanding

IC220 a = function2(b, c, d); SlideSet #3: Procedures &amp; } Instruction

IC220 SlideSet #4: Procedures (Chapter 2 finale) Stack Example Procedure Example &amp;

A B A B A B A B = + + = IC220 Slide Set #7: Digital Logic (more Appendix C)

IC220 Combinational Logic Slide Set #A2: Combinational and Multiplexors (mux)

Real World Example Buzzer Feature for a Car Should Buzz when IC220 1. the

ADMIN Reading finish Chapter 5 Sections 5.4 (skip 511-515), 5.5, 5.11, 5.12 IC220

IC220: Set #13: Building a real processor! ( Chapter 5) 1 The Processor: Datapath &amp; Control

ADMIN Course paper topics due Mon Feb 26 via plain text email IC220 Set #10: More

IC220 SlideSet #4: Procedures &amp; Chapter 2 Finale (Sections 2.8) Stack Example Procedure

IC220 Anonymous Feedback for Credit! Due: Wed, April 2, 2008 beginning of class Please fill out

IC220: Caching 1 (Chapter 5) 1 Memory, Cost, and Performance Ideal World: we want a memory

IC220 0010 (multiplicand) __x_1011 (multiplier) SlideSet #5: More Arithmetic, Floating Point,

IC220 Set #7: Controlling the Single Cycle Implementation (Chapter Four) 1 Control Selecting

IC220 Read 3.6 (Floating point skim details on addition, multiplication, rounding,

Outline IC220 Computer Architecture and Class Survey / Role Call Organization What is:

Outline IC220 Computer Architecture and Class Survey / Role Call Organization What is:

IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - specifically 5.7, 5.8) 1 Cache design

IC220 Slide Set #6: Digital Logic (Appendix B) 1 2 Appendix Goals Logic Design Digital

IC220 Gates Basic building blocks of logic Slide Set #A1: Digital Logic Combinational

ADMIN Read pages 211-215 (MIPS floating point instructions) Read 3.9 IC220 Set #10:

Big Picture Interrupts Processor IC220 Set #11: Cache Storage and I/O Memory- I/O bus Main

Big Picture Interrupts Processor Cache IC220 Set #18: Storage and I/O Memory- I/O bus Main

IC220 MIPS conditional branch instructions (I type): SlideSet #3: Control Flow bne $t0,

IC220 a = function2(b, c, d); SlideSet #3: Procedures & } Instruction

IC220 SlideSet #4: Procedures (Chapter 2 finale) Stack Example Procedure Example &

IC220: Set #13: Building a real processor! ( Chapter 5) 1 The Processor: Datapath & Control

IC220 SlideSet #4: Procedures & Chapter 2 Finale (Sections 2.8) Stack Example Procedure