ARM Assembly Programming Cuauhtemoc Carbajal 06/08/2013

Introduction • The ARM processor is very easy to program at the assembly level. (It is a RIS C) • We will learn ARM assembly programming at the user level.

Memory system • Memory is a linear array of 00 0x00000000 bytes addressed from 0 to 10 0x00000001 2 32 -1 20 0x00000002 • Word, half-word, byte 30 0x00000003 • Little-endian FF 0x00000004 FF 0x00000005 FF 0x00000006 00 0xFFFFFFFD 00 0xFFFFFFFE 00 0xFFFFFFFF

ARM programmer model • The state of an ARM system is determined by the content of visible registers and memory. • A user-mode program can see 15 32-bit general- purpose registers (R0-R14), program counter (PC) and CPS R. • Instruction set defines the operations that can change the state.

Byte ordering • Big Endian 00 0x00000000 – Least significant byte has highest address 10 0x00000001 Word address 0x00000000 20 0x00000002 Value: 00102030 30 0x00000003 • Little Endian FF 0x00000004 – Least significant byte has FF lowest address 0x00000005 FF Word address 0x00000000 0x00000006 Value: 30201000 00 0xFFFFFFFD 00 0xFFFFFFFE 00 0xFFFFFFFF

Data Sizes and Instruction Sets • The ARM is a 32-bit architecture • When used in relation to the ARM: – Byte means 8 bits – Halfword means 16 bits (two bytes) – Word means 32 bits (four bytes) • Most ARM’ s implement two instruction sets – 32-bit ARM Instruction S et – 16-bit Thumb Instruction S et • Jazelle cores: execute Java bytecode in hardware 6

ARM Memory Organization bit 31 bit 0 23 22 21 20 19 18 17 16 word16 15 14 13 12 half-word14 half-word12 11 10 9 8 word8 7 6 5 4 byte6 half-word4 byte 3 2 1 0 address byte3 byte2 byte1 byte0 7

Big Endian and Little Endian Big endian Little endian 8

Processor Modes • The ARM has seven basic operating modes: – User : unprivileged mode under which most tasks run – FIQ : entered when a high priority (fast) interrupt is raised – IRQ : entered when a low priority (normal) interrupt is raised – Supervisor : entered on reset and when a S oftware Interrupt instruction is executed – Abort : used to handle memory access violations – Undef : used to handle undefined instructions – System : privileged mode using the same registers as user mode 9

ARM Registers (1) r0 usable in user mode r1 r2 r3 privileged modes only r4 r5 r6 r7 r8_fiq r8 r9_fiq r9 r10_fiq r10 r11_fiq r11 r13_und r12_fiq r13_irq r12 r13_abt r13_svc r14_und r13_fiq r14_irq r13 r14_abt r14_svc r14_fiq r14 r15 (PC) SPSR_und SPSR_irq SPSR_abt CPSR SPSR_svc SPSR_fiq fiq svc abort irq undefined user mode mode mode mode mode mode 10

ARM Registers (2) • ARM has 37 registers all of which are 32-bits long – 1 dedicated program counter – 1 dedicated current program status register – 5 dedicated saved program status registers – 30 general purpose registers • The current processor mode governs which of several banks is accessible • Each mode can access – a particular set of r0-r12 registers – a particular r13 (the stack pointer, sp) and r14 (the link register, lr) – the program counter, r15 (pc) – the current program status register, cpsr • Privileged modes (except System) can also access – a particular spsr (saved program status register) 11

Current Program Status Registers (CPSR) • Hold information about the most recently performed ALU operation • Control the enabling and disabling of interrupts • Set the processor operating mode 12

Current Program Status Registers (CPSR) 13

Instruction set ARM instructions are all 32-bit long (except for Thumb mode). There are 2 32 possible machine instructions. Fortunately, they are structured.

Features of ARM instruction set • Load-store architecture • 3-address instructions • Conditional execution of every instruction • Possible to load/ store multiple register at once • Possible to combine shift and ALU operations in a single instruction

Instruction set MOV<cc><S> Rd, <operands> MOVCS R0, R1 @ if carry is set @ then R0:=R1 MOVS R0, #0 @ R0:=0 @ Z=1, N=0 @ C, V unaffected

Instruction set • Data processing (Arithmetic and Logical) • Data movement • Flow control

Data processing • Arithmetic and logic operations • General rules: – All operands are 32-bit, coming from registers or literals. – The result, if any, is 32-bit and placed in a register (with the exception for long multiply which produces a 64-bit result) – 3-address format

Arithmetic • ADD R0, R1, R2 @ R0 = R1+R2 • ADC R0, R1, R2 @ R0 = R1+R2+C • SUB R0, R1, R2 @ R0 = R1-R2 • SBC R0, R1, R2 @ R0 = R1-R2+C-1 • RSB R0, R1, R2 @ R0 = R2-R1 • RSC R0, R1, R2 @ R0 = R2-R1+C-1

Bitwise logic • AND R0, R1, R2 @ R0 = R1 and R2 • ORR R0, R1, R2 @ R0 = R1 or R2 • EOR R0, R1, R2 @ R0 = R1 xor R2 • BIC R0, R1, R2 @ R0 = R1 and (~R2) bit clear: R2 is a mask identifying which bits of R1 will be cleared to zero R1=0x11111111 R2=0x01100101 BIC R0, R1, R2 R0=0x10011010

Register movement • MOV R0, R2 @ R0 = R2 • MVN R0, R2 @ R0 = ~R2 move negated

Comparison • These instructions do not generate a result, but set condition code bits (N, Z, C, V) in CPS R. Often, a branch operation follows to change the program flow. compare • CMP R1, R2 @ set cc on R1-R2 compare negated • CMN R1, R2 @ set cc on R1+R2 bit test • TST R1, R2 @ set cc on R1 and R2 test equal • TEQ R1, R2 @ set cc on R1 xor R2

Addressing modes • Register operands ADD R0, R1, R2 • Immediate operands a literal; most can be represented by (0..255)x2 2n 0<n<12 ADD R3, R3, #1 @ R3:=R3+1 AND R8, R7, #0xff @ R8=R7[7:0] a hexadecimal literal This is assembler dependent syntax.

Shifted register operands • One operand to ALU is routed through the Barrel shifter. Thus, the operand can be modified before it is used. Useful for dealing with lists, table and other complex data structure. (similar to the displacement addressing mode in CIS C.)

Logical shift left C register 0 MOV R0, R2, LSL #2 @ R0:=R2<<2 @ R2 unchanged Example: 0…0 0011 0000 Before R2=0x00000030 After R0=0x000000C0 R2=0x00000030

Logical shift right register 0 C MOV R0, R2, LSR #2 @ R0:=R2>>2 @ R2 unchanged Example: 0…0 0011 0000 Before R2=0x00000030 After R0=0x0000000C R2=0x00000030

Arithmetic shift right register MS B C MOV R0, R2, ASR #2 @ R0:=R2>>2 @ R2 unchanged Example: 1010 0…0 0011 0000 Before R2=0xA0000030 After R0=0xE800000C R2=0xA0000030

Rotate right register MOV R0, R2, ROR #2 @ R0:=R2 rotate @ R2 unchanged Example: 0…0 0011 0001 Before R2=0x00000031 After R0=0x4000000C R2=0x00000031

Rotate right extended C register C MOV R0, R2, RRX @ R0:=R2 rotate @ R2 unchanged Example: 0…0 0011 0001 Before R2=0x00000031, C=1 After R0=0x80000018, C=1 R2=0x00000031

Shifted register operands

Shifted register operands • It is possible to use a register to specify the number of bits to be shifted; only the bottom 8 bits of the register are significant. @ R0:=R1+R2*2 R3 ADD R0, R1, R2, LSL R3

Setting the condition codes • Any data processing instruction can set the condition codes if the programmers wish it to 64-bit addition R1 R0 ADDS R2, R2, R0 + R3 R2 ADC R3, R3, R1 R3 R2

Multiplication • MUL R0, R1, R2 @ R0 = (R1xR2) [31:0] • Features: – S econd operand can’ t be immediate – The result register must be different from the first operand – If S bit is set, C flag is meaningless • S ee the reference manual (4.1.33)

Multiplication • Multiply-accumulate MLA R4, R3, R2, R1 @ R4 = R3xR2+R1 • Multiply with a constant can often be more efficiently implemented using shifted register operand MOV R1, #35 MUL R2, R0, R1 or ADD R0, R0, R0, LSL #2 @ R0’=5xR0 RSB R2, R0, R0, LSL #3 @ R2 =7xR0’

Data transfer instructions • Move data between registers and memory • Three basic forms – S ingle register load/ store – Multiple register load/ store – S ingle register swap: SWP(B), atomic instruction for semaphore

Single register load/store • The data items can be a 8-bitbyte, 16-bit halfword or 32-bit word. LDR R0, [R1] @ R0 := mem 32 [R1] STR R0, [R1] @ mem 32 [R1] := R0 LDR, LDRH, LDRB for 32, 16, 8 bits STR, STRH, STRB for 32, 16, 8 bits

Load an address into a register • The pseudo instruction ADR loads a register with an address table: .word 10 … ADR R0, table • Assembler transfer pseudo instruction into a sequence of appropriate instructions sub r0, pc, #12

Addressing modes • Memory is addressed by a register and an offset. LDR R0, [R1] @ mem[R1] • Three ways to specify offsets: – Constant LDR R0, [R1, #4] @ mem[R1+4] – Register LDR R0, [R1, R2] @ mem[R1+R2] – S caled @ mem[R1+4*R2] LDR R0, [R1, R2, LSL #2]

ARM Assembly Programming Cuauhtemoc Carbajal 06/08/2013 - PowerPoint PPT Presentation

ARM Assembly Programming Cuauhtemoc Carbajal 06/08/2013 Introduction The ARM processor is very easy to program at the assembly level. (It is a RIS C) We will learn ARM assembly programming at the user level. Memory system

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

ARM Cortex-M4 Programming Model ARM = Advanced RISC Machines, Ltd. ARM licenses IP to other

ARM Software Suite Powered by GDM Why use ARM Software? ARM is the software solution to plan,

ARM Advanced RISC Machines The ARM Instruction Set The ARM Instruction Set - ARM University

ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 1 / 40 A

Assembly Language Programming Introduction to ARM Zbigniew Jurkiewicz, Instytut Informatyki UW

Development by Azeria @fox0x01 ARM Exploit Benefits of Learning ARM Assembly Reverse

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

ARM v4T CS2253 Owen Kaser, UNBSJ ARM v4T History of ARM processors R is for RISC

ARM Reports Maja Talevska Milenkovska ERP Functional Consultant, Acumatica Class Syllabus Day

It's finally time for Arm in the Datacenter- and beyond [TUT1143] Jay Kruemcke Sr. Product

ARM A55 Cortex Austin Bae, Harrison Ding 12/5/2018 Introduction Implements the ARM v8.2-A

Porting FreeBSD on Xen on ARM How to support your OS as Xen ARM guest Julien Grall

Assembly Language Programming Assembler and assembly language Zbigniew Jurkiewicz, Instytut

#join Y assembly to Box JellyBox Build: 15_Y-Assembly Join (link directly to the y assembly part

Illustration: =0.4%, =1.2% n =35 per-arm per-stage Do all experimental treatments share a

Saber on ARM CCA-secure module lattice-based key encapsulation on ARM Angshuman Karmakar CHES,

EE 457 Unit 3 Instruction Sets With Focus on our Case Study: MIPS INSTRUCTION SET OVERVIEW 3.3

Open-source design ecosystems around Quick resume of FreeCAD Open-source 3D modelling

Curvature Estimation over Smooth Polygonal Meshes Using The Half Tube Formula Emil Saucan EE

# non-linearly. ! As height ( H ) increases, ( f ) decreases, $ % & non-linearly. As

Parallel Programming and Heterogeneous Computing SIMD: Integrated Accelerators Max Plauth, Sven

CSE 105 THEORY OF COMPUTATION Fall 2016 http://cseweb.ucsd.edu/classes/fa16/cse105-abc/ Today's

u I{ o,t "L 6,a I4{r^J..- E^tr{^t (urqi ) /**={<* )&, 1r4 h"l.b .^ r.t orl Pp