Assembly Language Programming Assembler and assembly language - - PowerPoint PPT Presentation

assembly language programming assembler and assembly
SMART_READER_LITE
LIVE PREVIEW

Assembly Language Programming Assembler and assembly language - - PowerPoint PPT Presentation

Assembly Language Programming Assembler and assembly language Zbigniew Jurkiewicz, Instytut Informatyki UW October 24, 2017 Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language Assembler


slide-1
SLIDE 1

Assembly Language Programming Assembler and assembly language

Zbigniew Jurkiewicz, Instytut Informatyki UW October 24, 2017

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-2
SLIDE 2

Assembler

Assembler = the program, which transform source file with program text in symbolic machine language to output file (called module) containing binary object code. A module contains the same program in machine langugage binary form. Assembler also produces other files containing program listing and cross-reference list. There are more than assembler for a given architecture (for example GNU Assembler), in labs we will use nasm assembler.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-3
SLIDE 3

Assemblation

To translate a program in file program.asm we should do (for 64 bit programs)

bash-2.04$ nasm -f elf64 program.asm

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-4
SLIDE 4

Linking

Program may contain more than one module, so before loading they should all be submitted to linkage processing. In the case of program contained fully in a single object module we call linker by

bash-2.04$ ld program.o

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-5
SLIDE 5

Linking

The linker usually creates an output file called a.out. To get the file with a different name, for example program, we should use option -o

bash-2.04$ ld -o program program.o

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-6
SLIDE 6

Assembler

The assembly language has many features, making writing machine langugage programs easier, but never breaks the basic rule: A single instruction of machine language corresponds to exactly single instruction of assembly language. Symbolic identifiers (labels) are used instead of numeric addresses. In addition to machine langugage instructions a program may contain additional lines with directives (also called pseudoinstructions or commands). They are performed at assembly time — when assembler transforms source program to machine language. In other words, they control the assembly process. Many assemblers offer additional mechanisms, like defining macrooperations and conditional assembly.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-7
SLIDE 7

Syntax

Assembly language instruction should be placed in a single line and has the following form:

label:

  • peration arguments

;comment

for example

start: add eax,ebx ;This is instruction

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-8
SLIDE 8

Syntax

Only the operation field must always be filled, other fields are used when needed. Label is a symbolic representation of instruction address, to be used in control instructions, e.g.

jmp start

Labels in assembly code start at the first column and should be terminated with colon.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-9
SLIDE 9

Syntax

The directives have a slightly different form:

name directive arguments ;comment

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-10
SLIDE 10

Constants

Programs may contain the following kinds of constants: binary a sequence of binary digits terminated with the letter B, e.g. 10110011B. decimal hexadecimal should be preceded by 0x, e.g. 0xA5. character a sequence of characters delimited by quotes, e.g. ’Ala’.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-11
SLIDE 11

Constants

Symbolic constants are defined using the directive equ

size equ 10 ... add ebx,size

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-12
SLIDE 12

Symbols

Symbolic names are used to name variables, constants, labels, segments/sections, etc. With each symbol the assembler associates additional informations called attributes, like section, address and type. Predefined symbol $ is used for the assembly counter, its associated value is always equal to the offset (i.e. relative address) of the current line of program. It is cleared (set to zero) when starting to assemble a new segment.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-13
SLIDE 13

Expressions

Arguments of instructions and pseudoinstructions may contain expressions. The assembler replaces them in-line by their computed value, e.g. the following code uses expression to define the size of some data area

info db 1,2,3,’Some string’,10 dl_info equ $-info

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-14
SLIDE 14

Memory reservation (“variables”)

Memory declaration directives serve to declare and reserve data cells, and possibly to initialize their contents. Declaration to reserve one or more initialized bytes is name db initial value[,...] for example

byte db 5 list db 1,2,3,4 mess db ’Message’,10

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-15
SLIDE 15

Memory reservation (“variables”)

If initialization is not needed we should use resb directive instead

key resb 1

giving the number of reserved variables (in this case 1), so we can reserve a sequence cells, e.g.

vector resb 25

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-16
SLIDE 16

Memory reservation (“variables”)

If we want to reserve some data area and fill it with repeating values, there is the prefix times, with the number of repetitions as argument

numbers times 17 db 31 periodic times 30 db 1,2,3,4

(in the second case the consecutive cells will be initialized with values 1, 2, 3, 4, 1, 2, 3, 4, 1,...).

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-17
SLIDE 17

Memory reservation (“variables”)

The directive dw reserves a word of memory (two bytes), and the directive dd reserves a double word (four bytes). Remember, that on Intel processors the line

  • per

dw 0x4315

is equivalent to

  • per

db 0x15,0x43

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-18
SLIDE 18

Memory reservation (“variables”)

Unfortunately NASM assembler does not store sizes associated with symbols (“types”), in its symbol table, so sometimes we have to prefix the address argument with size specification, e.g.

mov word [oper],1

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-19
SLIDE 19

Sections

Program written in assembly language is divided into sections. Each section starts with section directive, e.g. to declare section with machine instructions (“code”) of a program we write

section .text ...

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-20
SLIDE 20

Sections

Some section names are standard: .text — contains executable instructions, .data — contains initialized data, and .bss — contains non-initialized data.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-21
SLIDE 21

Sections

Of course you may define your own sections with different names. Additional attributes are then used to specify the section’s purpose.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-22
SLIDE 22

Exports

Procedures called only locally in their module are equivalent to labels in code and need not be declared. Procedures intended to be called from other modules should be exported – declared as globals. The module (may be the only one) containing the main program should export the symbol _start.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-23
SLIDE 23

Exports

The exported symbols (e.g. procedure names) are declared using

global symbol,...

Symbols are imported from other modules with the directive

extern symbol,...

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-24
SLIDE 24

Program structure

Program in assembly language should have the following basic structure:

;; Definitions of constants ... ;; Data section section .data ... ;; Code section section .text global _start ;Main program;-) _start: ... ... mov eax,60 syscall

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-25
SLIDE 25

Program structure

;; Procedures (in the same module) ... ;; Global non-initialized working data section .bss ... ;; End of program

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-26
SLIDE 26

Simple programs

The first program will compute the factorial of 4. The result will be returned as the argument of exit system call. System calls serve to communicate with the operating system. In Linux they are realized as software interrupt 0x80 (mostly in 32-bit code) or by syscall instruction. The number of a system call should be placed in the register eax, and arguments go into other registers.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-27
SLIDE 27

Simple programs

section .text global _start ;Declaration for linker (ld) ;; System calls SYS_EXIT equ 60 ;; Argument ARG equ 4

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-28
SLIDE 28

Simple programs

_start: ;Program start (entry point) mov rdi,1 ;Computed value mov rcx,ARG l1: cmp rcx,1 ;Is it all? jle finish imul rdi,rcx dec rcx ;Decrement the argument jmp l1 finish: mov eax,SYS_EXIT ;System call number syscall

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-29
SLIDE 29

Simple programs

The program has to be assembled and linked before running.

$ nasm -f elf first.asm $ ld -o first first.o $ ./first $ echo $? 24

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-30
SLIDE 30

Simple programs — output

The simplest program displaying to the console screen “Hello, world!” could look like this:

section .text global _start ;For linker (ld) STDOUT equ 1 ;Standard output (descriptor) ;; System calls SYS_EXIT equ 60 SYS_WRITE equ 1

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-31
SLIDE 31

Simple programs — output

_start: ;Entry point mov rdx,len ;Buffer length (arg 3) mov rsi,message ;BUffer address (arg 2) mov rdi,STDOUT ;Output descriptoru (arg 1) mov eax,SYS_WRITE ;System call number syscall mov rdi,0 ;Correct return (arg 1) mov eax,SYS_EXIT ;System call number syscall section .data ;; Message to display message db ’Hello, world!’,0xa ;string terminated with len equ $-message ;message length

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-32
SLIDE 32

Simple programs — output

The section .data was basically redundant, the message to display could be placed in section .text, because it is not modified.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-33
SLIDE 33

Procedures

Procedures are called with the following instruction

call name

The termination fo procedure call and return to the calling place occurs only after executing the instruction

ret

There is no default termination after finding the textual end

  • f procedure code, like in some higher level programming

languages. The return address is passed (automatically) on the top of stack.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-34
SLIDE 34

Parameters

If our procedures are only called from our program, the conventions of parameter passing and return are completely decided by us. But if we are going to communicate with main program or procedures written in C (or other language) the situation changes.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-35
SLIDE 35

Parameters

There are two typical methods of passing parameters. We can place all parameters on the stack.

This method is used ever for system calls on BSD.

The alternative is to pass parameters in registers.

Linux uses this for passong parameters for system calls (always) and also for ordinary procedures in 64-bit mode.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-36
SLIDE 36

Parameters

There are also mixed techniques, like passing first few parameters in registers and the rest on stack. This is necessary if the number of parameters is larger than the number of available registers (for system calls this problem does not exist). The returned value(s) can also be placed in register(s) or in a fixed place on the stack.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-37
SLIDE 37

Parameters

As we are using Linux, parameters for system calls should be passed in registers rdi, rsi, rdx, . . . (for 64-bit mode). The result will be in rax. In 64-bit mode parameters for procedures are passed in the same way, while in 32-bit mode they are passed on stack, starting from the last one and placing the first one

  • n the top.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-38
SLIDE 38

Simple program with procedure

We will modify our factorial program to put the numeric computation in a separate procedure.

section .text global _start ;; System calls SYS_EXIT equ 1 ;; Argument ARG equ 4

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-39
SLIDE 39

Simple program with procedure

_start: ;Entry point push dword ARG call factorial ;Result in rax mov rdi,,rax mov eax,SYS_EXIT ;System call number syscall

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-40
SLIDE 40

Simple program with procedure

factorial: mov rax,1 ;Computed value l1: cmp rdi,1 ;Is it all? jle finish imul rax,rdi dec rdi ;Decrement argument jmp l1 finish: ret

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-41
SLIDE 41

Saving registers

We must remember about save and restore the registers, whose contents is modified by our procedure. This can be done in the place where the procedure is called (strategy caller saves) or by the called procedure (emphcallee saves).

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-42
SLIDE 42

Saving registers

The second strategy is more popular. There are two reasons for that. First, sometimes the changes in the procedure code lead to the use of additional register. It is easier to find all places where the new register should be saved (probably

  • nly one at the beginning of the procedure).

Second, if saving and restoring is done in the procedure, the resulting sequences of push and pop instruction would

  • ccur only once, so the code will be shorter.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-43
SLIDE 43

Simple program with procedure

The factorial procedure can be written using recursion.

section .text global _start ;; System calls SYS_EXIT equ 1 ;; Argument ARG equ 4 _start: ;Entry point mov rdi,ARG call factorial ;Result in rax mov rdi,rax mov eax,SYS_EXIT syscall

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-44
SLIDE 44

Simple program with procedure

factorial: push rdi ;Save argument (the only one) cmp rdi,1 ;Is it all? jle finish dec rdi ;Decrement the argument call factorial ;Result in rax pop rdi ;Restore imul rax,rdi finish: ret

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-45
SLIDE 45

Library procedures

Now in our “greeting” program instead of system call we will use library procedure printf.

section .text global _start extern printf,exit _start: ;Entry point mov rdi,message call printf mov rdi,0 call exit section .data ;; Message to display message db ’Hello, world!’,0xa,0 ;string terminated with

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-46
SLIDE 46

Library procedures

We will have to link differently

$ nasm -f elf sixth.asm $ ld -o sixth sixth.o -lc -dynamic-linker /lib/ld-linux.so.2 $ ./szosty Hello, world!

The option -l specifies the name of the library (without prefix lib, and -dynamic-linker is necessary for using dynamic libraries.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-47
SLIDE 47

Procedures with arguments directly in code

Procedure arguments can be placed directly in code, but this possibility is rarely used. Example call

... call print db ’Hello, world!’,0 ;string terminated with NUL! ...

Procedure print will receive on the stack’s top the address of the argument (pretending to be return address). It will use it for diplaying message and then will replace it with the real return address — which is the address immediately after the end of the message.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-48
SLIDE 48

Procedures with arguments directly in code

print: pop rsi l1: cmp byte [rsi], 0 je l2 push rsi mov rdi,1 mov rdx,1 mov eax,SYS_WRITE syscall pop rsi inc rsi jmp l1 l2: inc rsi push rsi ;Real return address ret

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-49
SLIDE 49

Communication with programs in C

Procedures written in assembly language are often part of a program written in high-level programming language, like C, Lisp or Dylan. Such procedure shuld adapt to the calling conventions used by the compiler. The compiler gcc in 32-bit mode passes arguments on stack divided into procedure frames (activation records). The current frame is pointed to by ebp register. This register provides access to parameters, and should be saved in the procedure prolog, and reset to the current top of stack. Just before exiting procedure we should restore the previous (saved) value of frame register.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-50
SLIDE 50

Communication with programs in C

The template for a simple procedure in 32-bit mode has the following form

proc: push ebp mov ebp,esp mov eax,[ebp+8] ;first argument ... mov eax,result pop ebp ret

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-51
SLIDE 51

Communication with programs in C

There are two special instructions enter and leave to be used in prolog and epilog. For simple cases the procedure can also be written as

proc: enter 0,0 mov eax,[ebp+8] ;first argument ... mov eax,result leave ret

The procedures should also save the registers ebx, esi, edi (used for variables) and the segment registers cs, ds, es and ss.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-52
SLIDE 52

Communication with programs in C

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-53
SLIDE 53

Communication with programs in C

Now let us look at a (slightly “entangled”) greeting program in combination of assembler and C. First the assembler part

section .data hello db ’Hello, world!’,0 section.text global address address: lea rax,[hello] ret

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-54
SLIDE 54

Communication with programs in C

And now the C part #include <stdio.h> extern char* address (void); int main () { printf("%s\n", address()); return 0; }

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-55
SLIDE 55

Useful instructions

Now a few less known instructions. First looping instruction. In previous examples we have controlled the iteration using comparisons together with conditional jumps. Instead we can use loop instruction and its variants:

We set number of iterations in counter register ecx/rcx. Iteration loop starts with a label, e.g. iter1 At the end we put instruction loop iter1

The iteration will be performed as many times, as was the counter’s value (but at least once!).

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-56
SLIDE 56

Example: buffer initialization (from the end)

mov rcx,size ;in dwords mov eax,value jrcxz next iter: mov [buffer-4+rcx*4],eax loop iter next:

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-57
SLIDE 57

Example: looking for the end of C string

mov ecx,100 ;buffer length mov rbx,0 ;initial index iter: mov al,[buffer+rbx] ;next character inc rbx cmp al,0 loopne iter je notfound

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-58
SLIDE 58

Getting effective address

For system calls arguments are often addresses. It need not be constant, soetimes it is obtained as a sum of starting address and an index value contained in a register. This requires to compute effective address as follows

mov rcx,buffer add rcx,rsi ;Index in RSI

The same value can be computed using lea instruction. It determines the effective address of its second argument, and then just places it in the destination register given as the first argument.

lea rcx,[bufferr+rsi]

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-59
SLIDE 59

Getting effective address

Using on Pentium the advanced addressing modes with scaling we can often compute complex expression in a single cycle, e.g

lea rax,[rdx*4 + rdx + 7]

results in putting into register rax the value of the expression 5 * rdx + 7. In NASM this instruction could be written even simpler

lea rax,[rdx*5 + 7]

but this a specific feature of instruction parser. Attention: the lea instruction does not set flags.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-60
SLIDE 60

Extension

Sometimes in our programs we need to extend the argument size, e.g. to load a byte to a four-byte register. If the argument is a signed integer, we should use special instruction

movsx register,register-or-memory

This instruction correctly extends the argument sign, for example if the register ax contains the number -2345, then after executing the instruction

movsx ebx,ax

the register ebx will also contain -2345 (but represented no 32 bits).

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-61
SLIDE 61

Extension

If the argument does not contain a signed integer (e.g. it is a character ASCII code), we should use the instruction movzx, for example if the register al contains 177 (letter ˛ a in 8859-2 coding), then to put it to the register ebx we will use

movzx ebx,al

This instruction always extends using zeros.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-62
SLIDE 62

Checking and setting single bits

The basic instruction for checking and setting condition flags are test and and. Pentium processor have the instruction

bt source,bit-number

which copies the indicated bit to the CF flag. Bit number may be given as byte constant or register. If the source is a register, then bit number should not be greater than the register size, in the case of memory address there is no such restriction. This helps to implement bit tables.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-63
SLIDE 63

Checking and setting single bits

Instruction bt has companion instructions btc, bts and btr, which are used to set single bits. They are useful for implementation of synchronization mechanisms, such as semphores. Remember, that the instruction bt is slower than test.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-64
SLIDE 64

Block instructions

Block instructions are used for fast processing of sequences of memory cells (bytes or larger units), for example for fast copying part of image memory into other place. Rules:

put the start address of a source block into registers ds:rsi, put the start address of a destination block into registers es:rdi, put the length of the block into register ecx (in appropriate units: bytes, words etc.), The consecutive data pass through the register al, ax, or eax (depending on the unit size).

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-65
SLIDE 65

Block instructions

Example: fill the screen buffer with the address given in rdi with a character and an attribute given in al i ah (this technique can be used to direct operation on physical screen memory, but in Linux usually we do not the enough permissions). Without block instructions this can be done as follows:

fill: push rcx push rdi mov ecx,25*80 fill1: mov [rdi],ax inc rdi dec ecx loop fill1 pop rdi pop rcx ret

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-66
SLIDE 66

Block instructions

Now we will use the stosw instruction, which stores the contents of ax register in the memory cell addressed by rdi, and then automatically increase rdi by 2.

fill: push rcx push rdi mov ecx,25*80 cld fill1: stosw loop fill1 pop rdi pop rcx ret

The instruction cld clears the direction flag DF in processor status word. The value of this flag determines the direction of copying (for zero the addresses are increased).

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-67
SLIDE 67

Block instructions

To simplify our program ever more we can use the prefix rep. Using this prefix cause multiple execution of the preceded instruction with decreasing ecx register each time through the loop. Execution terminates when the ecx register become zero.

fill: push rcx push rdi mov ecx,25*80 cld rep stosw pop rdi pop rcx ret

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-68
SLIDE 68

Block instructions

Other example: we want to copy 60 bytes from the address given in rsi to another area starting from address in rdi (we assume that the regions do not intersect.

copy: pushf push rax push rcx push rsi push rdi mov ecx,60 cld rep movsb pop rdi pop rsi pop rcx pop rax popf ret

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-69
SLIDE 69

Block instructions

Attention: in the last example we save on stack the previous state of DF flag (together with other flags). It is necessary in system programs (especially used in interrupt handlers) and is recommended for library procedures.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-70
SLIDE 70

Inserting files

During the assemblation it is possible to include additional files into the program text, e.g. containing definitions of useful constants and macros. The directive %include is used for that , for example to insert the contents of the file my-macros.asm we write

%include "my-macros.asm"

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-71
SLIDE 71

Macros

NASM has two kinds of macros: single-line and multiple-lines. They operate in the same way: a macro call in text is during the assemblation process replaced with expanded form of macro body.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-72
SLIDE 72

Macros

Single-line macros, which correspond to expressions, are defined using the directive %define. The simplest single-line macros are used to define constants, e.g.

%define TCGETS 0x5401

Starting from this position in the program code all

  • ccurences of the symbol TCGETS will be textually

replaced by the constant 0x5401. Of course the same result could be achieved using the directive equ.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-73
SLIDE 73

Macros

The definition of constant can be more complicated, for example it may form a sequence of syntax elements

%define CTRL 0x1F &

In this case the line

mov byte [ebx+4], CTRL ’D’

will be replace by

mov byte [ebx+4], 0x1F & ’D’

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-74
SLIDE 74

Macros

Single line macros accept parameters. Each parameter in macro definition will be replaced textually by the argument from the macro call, e.g. for the macro

%define param(n) ([ebp + 4 * (n) + 4])

the line

mov edx,param(2)

will be replaced by

mov edx,[ebp+12]

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-75
SLIDE 75

Macros

Parentheses should be used in a macrodefinition to prevent the unwanted order of computation (like for macros C) . The general rule: we surround the whole macro body with parentheses and we put the parenthes around any

  • ccurence of a parameter.

Sometimes some parentheses are not necessary and may be skipped, e.g. in our last definition the external pair of parenthes is not needed.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-76
SLIDE 76

Macros

NASM accepts overloading of macros – defining macros of the same name with a different number of parameters. This is useful mostlly to provide default arguments, for example

%define increase(x) ((x) + 1) %define increase(x,y) ((x) + (y))

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-77
SLIDE 77

Macros

Multi line macros are defined using the %macro directive. Their macro calls are replaced by sequences of instructions and other directives (possibly also macro calls). The simplest macros have no parameters, e.g. to terminate the program

%macro exit 0 mov eax,1 int 0x80 %endmacro

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-78
SLIDE 78

Macros

The above macro may be used in a program as the new “instruction”

... exit ...

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-79
SLIDE 79

Macros

The macroinstruction for zeroing the register requires a parameter

%macro clr 1 xor %1,%1 %endmacro

and could be called as follows

... clr eax ...

The argument in the header of a macrodefinition declares the number of parameters, the parameters are referred in the body as %1,%2,....

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-80
SLIDE 80

Macros

Similarly for standard procedure prologue for framed stack in 32-bit mode we can define single-argument macro jednoargumentowe makro prolog

%macro prolog 1 push ebp mov ebp,esp push ebx push esi push edi sub esp,%1 ;reserve space for a given number of %endmacro

This macro can be used as

myfunction: prolog 12

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-81
SLIDE 81

Macros

%0 denotes in the body the number of passed arguments and could be used for conditional assembly etc. (see later)

... %rep %0 ... %endrep

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-82
SLIDE 82

Macros

For more complex macros(e.g. containing the loop) we could labels With multple occurences of the macro call in a module it would result in repeated symbol definition error, so local labels in macro definition should be preceded by characters %%. During each macroexpansion the assembler replaces them with newly generated, unique names

%macro abs 1 cmp %1,0 jge %%skip neg %1 %%skip: %endmacro

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-83
SLIDE 83

Macros

Warning: macros are bad to your health. Complicated macros often use additional registers, which are not visible in the environment of a macro call. It is then easy to forget to save them on stack. The higher-level languages without closures sometimes use macros to provide access to “non-local” variables in the body. In sasembler there is no such need (“everything is visible”), so do not use macros instead of procedures.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-84
SLIDE 84

Conditional assembly

Using conditional assembly it is posiible to generate binary programs for different computational environments (e.g. different operating systems) using the same program text. But the most popular use is to insert additional instructions, used only during program development and debugging.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-85
SLIDE 85

Conditional assembly

Usually it is done with the symbol DEBUG and the construction

%ifdef symbol instruction sequence %endif

If the symbol is defined, the instruction sequence will be processed and included into final program, otherwise they will be skipped, for example

%ifdef DEBUG call display_state %endif

will cause the procedury call of display_state only if during the assemblation the symbol DEBUG was defined at this moment.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-86
SLIDE 86

Conditional assembly

More general conditional assembly is performed with

%if condition ... %elif condition ... %else ... %endif

Of course condition is checked during the assemblation.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-87
SLIDE 87

Conditional assembly

Other useful directives are used for assembly-time iteration

%rep n ... %endrep

They simply repeatedly assemble their body n times. Using the above construct we can fill a data area with consecutive integer numbers.

%assign next 0 %rep 64 dd next %assign next next + 1 %endrep

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-88
SLIDE 88

Conditional assembly

The directive %assign is used to define and redefine assembly-time variables, as seen above. Contrary to the symbols defined by %define the values is immediately computed and must be a number.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-89
SLIDE 89

Conditional assembly

To leave the loop %rep before end we can use %exitrep, e.g.

... %if sum > 65000 %exitrep %endif ...

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-90
SLIDE 90

Data structures (records)

Structured data types (aka. records) we declare using the construct:

struc my_type field1: resb 1 ... endstruc

This creates constants named field1,... Their values are offsets – the distances from the structure’s beginning to the first byte of each field.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-91
SLIDE 91

Data structures (records)

Objects of such declared type are defined using the directive istruc:

my_obj: istruc my_type at fieldi, db initial-value ... iend

The optional at clauses serve to initialize fields, but the

  • rder of fields from the structure declaration must be

precerved.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-92
SLIDE 92

Data structures (records)

Fields of the record are referred by offset type.field or simply fields, e.g..

mov eax,[ebx + fields]

Of course we can forget about the consistency of types anf fields.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-93
SLIDE 93

Gnu Assembler (gas)

Assemblation: as hello.s -o hello.o Linkage (if we use libraries):

ld -dynamic-linker /lib/ld-linux.so.2 -o hello hello.o

For libraries ld -shared -o hellolib.so hello.o

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-94
SLIDE 94

AT&T syntax (from Unix circles)

Register names are preceded by ‘%’ character, e.g. %eax, %dl. This was done to be able to use arbitrary C identifiers in assembly programs, without preceding them by underline character (the usual practice from old ages). The order of arguments for binary (and other) instructions is: first source argument, then the destination argument, contrary to “intel” compatible assemblers. For example the NASM instruction mov eax,edx will be written in GAS as movl %edx, %eax Comments are preceded by ‘#’ characters.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-95
SLIDE 95

AT&T syntax (from Unix circles)

The argument size must be explicitly given by sufix for

  • peration mnemonic: b for byte (8 bits), w for word (16

bits), l for long (32 bits) and q for quad (64 bits). For example when operating on registers edx andi eax we have to use movl instead of mov. However, gas does not require strict AT&T syntax, so the suffix is optional when length can be guessed from register

  • perands, and else defaults to 32-bit (with a warning).

Immediate arguments (constants and pointers) have to be marked by prefix $, e.g. movl $5,%eax If the prefix $ is missing, the argument will be used as a memory cell address, for example movl licznik,%eax puts the contents of the variable licznik into eax register.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-96
SLIDE 96

AT&T syntax (from Unix circles)

Parentheses are used to indicate indirection for address registers, indexing etc. testb $0x80,17(%ebp) Different sytax for indirect and indexing addressing modes. The GAS instruction movl %eax,8(%ebx,%edi,4) coresponds to mov [8 + ebx + 4 * edi],eax in NASM. The GAS syntax is:

constant-address(offset-register,index-register,size-of-element)

Unnecesary registers are omitted

movl dane(,%edi,4),%eax

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-97
SLIDE 97

Gas directives

Directive name always(?) starts with dot. Including files .include "linux-calls.s" Section declarations are similar .section .text Constant definitions

.equ ile,15 helloworld: .ascii "hello world\n" helloworld_end: .equ helloworld_len,helloworld_end - helloworld

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language

slide-98
SLIDE 98

Gas directives

Global (exported) symbols:

.globl _start

Memory reservation (in .bss):

.lcomm <name>,<size>

Memory reservation with initialization:

.ascii "This is text\n\0" .byte 10,12,4 .int 234,1487 .long 2345,-345

Repetition:

.rept 50 .byte 1,2 .endr

Symbol type declaration:

.type <symbol>,@function

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Assembler and assembly language