Assembly basics CS 2XA3 Term I, 2020/21 Outline What is Assembly - - PowerPoint PPT Presentation

assembly basics cs 2xa3
SMART_READER_LITE
LIVE PREVIEW

Assembly basics CS 2XA3 Term I, 2020/21 Outline What is Assembly - - PowerPoint PPT Presentation

Assembly basics CS 2XA3 Term I, 2020/21 Outline What is Assembly Language ? Assemblers Why Assembly? NASM Character and String Literals Integer literals Labels and Names Statements Program structure Input/Output Compiling+linking What


slide-1
SLIDE 1

Assembly basics CS 2XA3

Term I, 2020/21

slide-2
SLIDE 2

Outline

What is Assembly Language ? Assemblers Why Assembly? NASM Character and String Literals Integer literals Labels and Names Statements Program structure Input/Output Compiling+linking

slide-3
SLIDE 3

What is Assembly Language?

In a high level language (HLL), one line of code usually translates to 2, 3 or more machine instructions. Some statements may translate to hundreds or thousands of machine instructions. ◮ In Assembly Language (AL), one line of code translates to

  • ne machine instruction; AL is a "human readable" form of

machine language ◮ HLLs are designed to be "machine-independent", but machine dependencies are almost impossible to eliminate. ◮ ALs are NOT machine-independent. Each different machine (processor) has a different machine language. Any particular machine can have more than one assembly language

slide-4
SLIDE 4

Assemblers

An assembler is a program that translates an assembly language program into binary code of machine instructions ◮ NASM Netwide Assembler ◮ MASM Microsoft Assembler ◮ GAS GNU assembler ◮ ARM Assembly Language

slide-5
SLIDE 5

Why Assembly?

There are two reasons to write programs in assembly: (a) it is the only “language” the CPU understands (b) to obtain understanding of how the CPU works Item (b) is also the biggest disadvantage of assembly programming: you are programming from the perspective

  • f a CPU, not that of a human brain. This proves difficult

for a wide range of students who are not familiar with CPU architecture and are used to have “convenient” data structures at hand (such as list in Python).

slide-6
SLIDE 6

NASM

◮ We are using 64-bit NASM in this course ◮ NASM is operating system independent

  • One of the two widely used Linux assemblers (the
  • ther is GAS)
  • NASM is an open source (80x86 and x86-64

architecture) assembler. Compared to MASM, TASM, or GAS, it is rather easy to use and provides convenient syntactic constructs.

slide-7
SLIDE 7

NASM

◮ We will not cover NASM syntax in full depth

  • We are interested in a basic machine interface

and NOT in a proficient production assembler programming

  • NASM has many syntactic constructs similar to C
  • NASM has an extensive preprocessor similar to

the C preprocessor.

slide-8
SLIDE 8

Character and String Literals

Escape characters

format description ASCII decimal value

\' single quote (') 39 \" double quote (") 34 \‘ backquote (‘) 96 \\ back slash (\) 92 \? question mark (?) 63 \t tab (TAB) 9 \n newline (LF) 10 \r carriage return (CR) 13 'This is a string literal' "This is a string literal, too" ‘Backquoted strings can use escape chars\n‘

slide-9
SLIDE 9

Integer literals

200 integer in decimal notation 0200 decimal - the leading 0 does not make it octal 0200d explicit - d suffix 0d200 also explicit decimal - 0d prefix 0c8h hexadecimal - h suffix leading 0 is required, because c8h looks like a name 0xc8 hexadecimal - the classic 0x prefix 0hc8 hexadecimal - for some reason NASM likes prefix 0h 310q

  • ctal - q suffix

0q310

  • ctal - 0q prefix

11001000b binary - b suffix 0b1100_1000 binary - 0b prefix, underscores are allowed

slide-10
SLIDE 10

Labels and Names

Names identify labels, variables, symbols, and keywords ◮ May contain: letters: a..z A..Z digits: 0..9 special chars: ? _ @ $ . ~ ◮ NASM (unlike most assemblers) is case-sensitive with respect to labels and variables – it is not case-sensitive with respect to keywords, mnemonics, register names, directives, etc. ◮ First character must be a letter, _ or . (which has a special meaning in NASM as a “local label” indicating it can be redefined) ◮ Names cannot match a reserved word (and there are many reserved words!)

slide-11
SLIDE 11

Statements

Syntax: [label[:]] [mnemonic] [operands] [;comment]

◮ [ ] indicates optionality ◮ Note that all parts are optional → blank lines are legal ◮ Labels are used to identify locations in code (instruction labels) or memory location (data definitions labels) ◮ Statements are free form; they need not be formed into columns ◮ Statement must be on a single line, max 128 chars

slide-12
SLIDE 12

Examples of Statements

a100: add rax, rdx ; add subtotal Labels often appear on a separate line for code clarity: a100: ADD RAX, RDX ; add subtotal Note case-insensitivity of mnemonics (add or ADD) and registers (rax or RAX), however A100 instead of a100 would be wrong.

slide-13
SLIDE 13

Type of statements

◮ Directives+Pseudo-instructions limit EQU 100 ;defines a symbol limit %define limit 100 ;like C #define ◮ Data Definitions msg: db 'Welcome to Assembler!' db 0Dh, 0Ah count dd 0 mydat: dd 1,2,3,4,5 resd 100 ;reserves 100 dwords ◮ Instructions mov rax, rbx ADD RCX, 10

slide-14
SLIDE 14

Directives

directives for linker extern printf declares printf to be an external symbol global asm_main declares asm_main to be an entry point directives for preprocessor %define ctrl 0x1F every occurrence of symbol ctrl is replaced by literal 0x1F %define b(x) 2*x b(y) is replaced by the value of 2*y %define a(x) 1+b(x) a(y) is replaced by the value of 1+2*y %include "file10" replaced by the contents of the file

slide-15
SLIDE 15

Pseudo-instructions

Pseudo-instructions are not x86 instructions, rather they are part of the NASM assembler. These are used to declare initialized and uninitialized data and few other things. Lets look

  • ver them in brief :

ctrl EQU 0x1F every occurrence of symbol ctrl is replaced by literal 0x1F and cannot be changed, i.e. defines a constant (similar to %define)

slide-16
SLIDE 16

Data definitions

Declaring Initialized Data General format is [label[:]] <pseudo-instruction> <value> [;comment] initialized data declaration pseudo-instructions: DB, DW, DD, DQ are used to declare initialized data. The first letter D stands for data, and the second stands for: Byte (1 byte), Word (2 bytes), Dword (4 bytes), and Qword (8 bytes).

slide-17
SLIDE 17

◮ label1 db ABh declares byte with value AB in hex with label label1 ◮ label2 db 1010010b declares byte with value 1010010 in binary with label label2 ◮ label3: dw 12ABh declares word with value 12AB in hex with label label3 ◮ label4 dd 1A2Bh declares double word with value 1A2B in hex with label label4 ◮ label5: db "A" declares byte with value of the ASCII code of A i.e. 65 in decimal.

slide-18
SLIDE 18

Array – the only innate data structure available to NASM (there is a mechanism to define user data structures which is an advanced topic not covered in this course). Array is several items of the same type together, stored in consecutive memory one after another. String is another word for byte array. A C-string is a byte array terminated with byte 0 (null character). ◮ label6 db 0, 1, 2, 3 declare 4 consecutive bytes with values 0, 1, 2 and 3 respectively ◮ label7 db "h", "e", "l", "l", "o", 0 declares a C-string of length 6 (the terminator 0 is not counted). ◮ label8 db "hello",0 The same as label7

slide-19
SLIDE 19

Declaring Uninitialized Data General format is [label[:]] <pseudo-instruction> [;comment] uninitialized data declaration pseudo-instructions: RESB, RESW, RESD, RESQ are used to declare uninitialized

  • data. The first part RES stands for reserve, and the last letter

stands again for: Byte (1 byte), Word (2 bytes), Dword (4 bytes), and Qword (8 bytes). mybuffer: resb 64 reserve 64 bytes with label mybuffer mywordbuffer resw 64 reserve 128 bytes (64 words) with label mywordbuffer

slide-20
SLIDE 20

Times

The times pseudo-instruction It is a very versatile pseudo-instruction. It is a kind of a loop, but we will use it only in data definitions to initialize arrays So, for us the format is

[label[:]] TIMES <value> <pseudo-instruction> [;comment]

times 10 db 0 is the same as db 0,0,0,0,0,0,0,0,0,0

slide-21
SLIDE 21

The Location Counter

str1 DB 'This is a string' slen EQU $-str1 ; const slen = 16 ◮ The symbol $ refers to the location counter ◮ As the assembler processes source code, it emits either code or data into the object code. ◮ The location counter is incremented for each byte emitted ◮ With slen EQU $-str1 the assembler performs the arithmetic to compute the length of str1 ◮ Note the use str1 in this expression as a numeric value (the address of the first byte)

slide-22
SLIDE 22

Program layout

BS S came from “ Block S tarted by S ymbol” , an assembler for IBM 704 in the 1950s.

slide-23
SLIDE 23

NASM program structure

%include "simple_io.inc" segment .data ;initialized data segment .bss ;uninitialized data segment .text global asm_main asm_main: enter 0,0 ;setup saveregs ;save all registers (our macro) ;put your code here restoregs ;restore all registers (our macro) mov rax,0 ;return value leave ret

slide-24
SLIDE 24

Input/Output

Input/Output (standardly abbreviated as I/O) routines. ◮ We will only deal with standard input and standard

  • utput.

◮ We will deal with I/O through the preprogrammed routines in the file simple_io.asm It requires that the header file simple_io.inc be included: %include "simple_io.inc" ◮ The great advantage is that the I/O routines in simple_io.asm do not use system stack, the information is passed in/out in RAX register. Thus, calling of any of these routines does not involve manipulation of the system stack, a huge simplifi- cation.

slide-25
SLIDE 25

Simple I/O routines ◮ print_int prints the integer stored in RAX ◮ print_char prints ASCII value of AL ◮ print_string prints the C-string stored at the address stored in RAX ◮ print_nl prints newline ◮ read_char reads a character into AL

slide-26
SLIDE 26

C driver

◮ For the sake of simplicity, we are using a C program, driver.c to create the executable for Linux utilizing the GNU C compiler to provide linking. ◮ The driver also passes on the command line arguments (if any), or may provide some additional functionality for the assembler code. ◮ It requires that the global entry point in the assembly code is named asm_main.

slide-27
SLIDE 27

Compiling+linking

To create an executable for your NASM program named my_prog.asm, you need to create the object files simple_io.o, my_prog.o, driver.o and link them all together:

◮ nasm -felf64 -o simple_io.o simple_io.asm ◮ nasm -felf64 -o my_prog.o my_prog.asm ◮ gcc -o my_prog driver.c my_prog.o simple_io.o

slide-28
SLIDE 28

Compiling+linking

It is thus worthwhile to create a makefile:

simple_io.o: simple_io.asm nasm -felf64 -o simple_io.o simple_io.asm my_prog.o: my_prog.asm simple_io.inc nasm -felf64 -o my_prog.o my_prog.asm my_prog: driver.c my_prog.o simple_io.o gcc -o my_prog driver.c my_prog.o simple_io.o

Now, to create an executable from your source code my_prog.asm, you just need to type: make my_prog