SLIDE 1
Assembly basics CS 2XA3 Term I, 2020/21 Outline What is Assembly - - PowerPoint PPT Presentation
Assembly basics CS 2XA3 Term I, 2020/21 Outline What is Assembly - - PowerPoint PPT Presentation
Assembly basics CS 2XA3 Term I, 2020/21 Outline What is Assembly Language ? Assemblers Why Assembly? NASM Character and String Literals Integer literals Labels and Names Statements Program structure Input/Output Compiling+linking What
SLIDE 2
SLIDE 3
What is Assembly Language?
In a high level language (HLL), one line of code usually translates to 2, 3 or more machine instructions. Some statements may translate to hundreds or thousands of machine instructions. ◮ In Assembly Language (AL), one line of code translates to
- ne machine instruction; AL is a "human readable" form of
machine language ◮ HLLs are designed to be "machine-independent", but machine dependencies are almost impossible to eliminate. ◮ ALs are NOT machine-independent. Each different machine (processor) has a different machine language. Any particular machine can have more than one assembly language
SLIDE 4
Assemblers
An assembler is a program that translates an assembly language program into binary code of machine instructions ◮ NASM Netwide Assembler ◮ MASM Microsoft Assembler ◮ GAS GNU assembler ◮ ARM Assembly Language
SLIDE 5
Why Assembly?
There are two reasons to write programs in assembly: (a) it is the only “language” the CPU understands (b) to obtain understanding of how the CPU works Item (b) is also the biggest disadvantage of assembly programming: you are programming from the perspective
- f a CPU, not that of a human brain. This proves difficult
for a wide range of students who are not familiar with CPU architecture and are used to have “convenient” data structures at hand (such as list in Python).
SLIDE 6
NASM
◮ We are using 64-bit NASM in this course ◮ NASM is operating system independent
- One of the two widely used Linux assemblers (the
- ther is GAS)
- NASM is an open source (80x86 and x86-64
architecture) assembler. Compared to MASM, TASM, or GAS, it is rather easy to use and provides convenient syntactic constructs.
SLIDE 7
NASM
◮ We will not cover NASM syntax in full depth
- We are interested in a basic machine interface
and NOT in a proficient production assembler programming
- NASM has many syntactic constructs similar to C
- NASM has an extensive preprocessor similar to
the C preprocessor.
SLIDE 8
Character and String Literals
Escape characters
format description ASCII decimal value
\' single quote (') 39 \" double quote (") 34 \‘ backquote (‘) 96 \\ back slash (\) 92 \? question mark (?) 63 \t tab (TAB) 9 \n newline (LF) 10 \r carriage return (CR) 13 'This is a string literal' "This is a string literal, too" ‘Backquoted strings can use escape chars\n‘
SLIDE 9
Integer literals
200 integer in decimal notation 0200 decimal - the leading 0 does not make it octal 0200d explicit - d suffix 0d200 also explicit decimal - 0d prefix 0c8h hexadecimal - h suffix leading 0 is required, because c8h looks like a name 0xc8 hexadecimal - the classic 0x prefix 0hc8 hexadecimal - for some reason NASM likes prefix 0h 310q
- ctal - q suffix
0q310
- ctal - 0q prefix
11001000b binary - b suffix 0b1100_1000 binary - 0b prefix, underscores are allowed
SLIDE 10
Labels and Names
Names identify labels, variables, symbols, and keywords ◮ May contain: letters: a..z A..Z digits: 0..9 special chars: ? _ @ $ . ~ ◮ NASM (unlike most assemblers) is case-sensitive with respect to labels and variables – it is not case-sensitive with respect to keywords, mnemonics, register names, directives, etc. ◮ First character must be a letter, _ or . (which has a special meaning in NASM as a “local label” indicating it can be redefined) ◮ Names cannot match a reserved word (and there are many reserved words!)
SLIDE 11
Statements
Syntax: [label[:]] [mnemonic] [operands] [;comment]
◮ [ ] indicates optionality ◮ Note that all parts are optional → blank lines are legal ◮ Labels are used to identify locations in code (instruction labels) or memory location (data definitions labels) ◮ Statements are free form; they need not be formed into columns ◮ Statement must be on a single line, max 128 chars
SLIDE 12
Examples of Statements
a100: add rax, rdx ; add subtotal Labels often appear on a separate line for code clarity: a100: ADD RAX, RDX ; add subtotal Note case-insensitivity of mnemonics (add or ADD) and registers (rax or RAX), however A100 instead of a100 would be wrong.
SLIDE 13
Type of statements
◮ Directives+Pseudo-instructions limit EQU 100 ;defines a symbol limit %define limit 100 ;like C #define ◮ Data Definitions msg: db 'Welcome to Assembler!' db 0Dh, 0Ah count dd 0 mydat: dd 1,2,3,4,5 resd 100 ;reserves 100 dwords ◮ Instructions mov rax, rbx ADD RCX, 10
SLIDE 14
Directives
directives for linker extern printf declares printf to be an external symbol global asm_main declares asm_main to be an entry point directives for preprocessor %define ctrl 0x1F every occurrence of symbol ctrl is replaced by literal 0x1F %define b(x) 2*x b(y) is replaced by the value of 2*y %define a(x) 1+b(x) a(y) is replaced by the value of 1+2*y %include "file10" replaced by the contents of the file
SLIDE 15
Pseudo-instructions
Pseudo-instructions are not x86 instructions, rather they are part of the NASM assembler. These are used to declare initialized and uninitialized data and few other things. Lets look
- ver them in brief :
ctrl EQU 0x1F every occurrence of symbol ctrl is replaced by literal 0x1F and cannot be changed, i.e. defines a constant (similar to %define)
SLIDE 16
Data definitions
Declaring Initialized Data General format is [label[:]] <pseudo-instruction> <value> [;comment] initialized data declaration pseudo-instructions: DB, DW, DD, DQ are used to declare initialized data. The first letter D stands for data, and the second stands for: Byte (1 byte), Word (2 bytes), Dword (4 bytes), and Qword (8 bytes).
SLIDE 17
◮ label1 db ABh declares byte with value AB in hex with label label1 ◮ label2 db 1010010b declares byte with value 1010010 in binary with label label2 ◮ label3: dw 12ABh declares word with value 12AB in hex with label label3 ◮ label4 dd 1A2Bh declares double word with value 1A2B in hex with label label4 ◮ label5: db "A" declares byte with value of the ASCII code of A i.e. 65 in decimal.
SLIDE 18
Array – the only innate data structure available to NASM (there is a mechanism to define user data structures which is an advanced topic not covered in this course). Array is several items of the same type together, stored in consecutive memory one after another. String is another word for byte array. A C-string is a byte array terminated with byte 0 (null character). ◮ label6 db 0, 1, 2, 3 declare 4 consecutive bytes with values 0, 1, 2 and 3 respectively ◮ label7 db "h", "e", "l", "l", "o", 0 declares a C-string of length 6 (the terminator 0 is not counted). ◮ label8 db "hello",0 The same as label7
SLIDE 19
Declaring Uninitialized Data General format is [label[:]] <pseudo-instruction> [;comment] uninitialized data declaration pseudo-instructions: RESB, RESW, RESD, RESQ are used to declare uninitialized
- data. The first part RES stands for reserve, and the last letter
stands again for: Byte (1 byte), Word (2 bytes), Dword (4 bytes), and Qword (8 bytes). mybuffer: resb 64 reserve 64 bytes with label mybuffer mywordbuffer resw 64 reserve 128 bytes (64 words) with label mywordbuffer
SLIDE 20
Times
The times pseudo-instruction It is a very versatile pseudo-instruction. It is a kind of a loop, but we will use it only in data definitions to initialize arrays So, for us the format is
[label[:]] TIMES <value> <pseudo-instruction> [;comment]
times 10 db 0 is the same as db 0,0,0,0,0,0,0,0,0,0
SLIDE 21
The Location Counter
str1 DB 'This is a string' slen EQU $-str1 ; const slen = 16 ◮ The symbol $ refers to the location counter ◮ As the assembler processes source code, it emits either code or data into the object code. ◮ The location counter is incremented for each byte emitted ◮ With slen EQU $-str1 the assembler performs the arithmetic to compute the length of str1 ◮ Note the use str1 in this expression as a numeric value (the address of the first byte)
SLIDE 22
Program layout
BS S came from “ Block S tarted by S ymbol” , an assembler for IBM 704 in the 1950s.
SLIDE 23
NASM program structure
%include "simple_io.inc" segment .data ;initialized data segment .bss ;uninitialized data segment .text global asm_main asm_main: enter 0,0 ;setup saveregs ;save all registers (our macro) ;put your code here restoregs ;restore all registers (our macro) mov rax,0 ;return value leave ret
SLIDE 24
Input/Output
Input/Output (standardly abbreviated as I/O) routines. ◮ We will only deal with standard input and standard
- utput.
◮ We will deal with I/O through the preprogrammed routines in the file simple_io.asm It requires that the header file simple_io.inc be included: %include "simple_io.inc" ◮ The great advantage is that the I/O routines in simple_io.asm do not use system stack, the information is passed in/out in RAX register. Thus, calling of any of these routines does not involve manipulation of the system stack, a huge simplifi- cation.
SLIDE 25
Simple I/O routines ◮ print_int prints the integer stored in RAX ◮ print_char prints ASCII value of AL ◮ print_string prints the C-string stored at the address stored in RAX ◮ print_nl prints newline ◮ read_char reads a character into AL
SLIDE 26
C driver
◮ For the sake of simplicity, we are using a C program, driver.c to create the executable for Linux utilizing the GNU C compiler to provide linking. ◮ The driver also passes on the command line arguments (if any), or may provide some additional functionality for the assembler code. ◮ It requires that the global entry point in the assembly code is named asm_main.
SLIDE 27
Compiling+linking
To create an executable for your NASM program named my_prog.asm, you need to create the object files simple_io.o, my_prog.o, driver.o and link them all together:
◮ nasm -felf64 -o simple_io.o simple_io.asm ◮ nasm -felf64 -o my_prog.o my_prog.asm ◮ gcc -o my_prog driver.c my_prog.o simple_io.o
SLIDE 28