Assembly Language Programming Optimization Zbigniew Jurkiewicz, - PowerPoint PPT Presentation

Assembly Language Programming Optimization Zbigniew Jurkiewicz, Instytut Informatyki UW December 9, 2017 Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Conditional transfer Sometimes we make comparison only to execute a single assignment depending on the result. Then we can use conditional move instruction, where assignment is performed only if the indicated condition was satisfied, e.g. the instruction cmove eax,1 sets register eax to 1 only if recently compared elements were equal. The main advantage is avoidance of the necessity of cleaning the pipeline or speculative execution. lub wykonania spekulacyjnego. Conditional assignment SET. Conditional transfer CMOV. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Conditional transfer: an example Find maximum of two numbers (arguments in EAX and EBX, result in ECX): mov ecx,eax cmp ebx,ecx cmova ecx,ebx Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Conditional transfers: errors Assume we are compiling in C the expression int *xp; ... return (xp ? *xp : 0); If xp is in rdi , we could try xor eax,eax ;Maybe we will return zero test rdi,rdi ;xp == 0 ? cmovne eax,[rdi] ;Maybe we will return *xp But then the dereference of xp will occurs always (even for the NULL pointer), and this we want to avoid. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Jump avoidance Avoiding jumps ia a larger problem. Let us look at the computation of absolute value of number test eax,eax ;We set flags jns omi´ n ;Positive sign neg eax skip: Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Jump avoidance There is a different way: mov ecx,eax sar ecx,31 ;sign bit everywhere xor eax,ecx ;bit reverse sub eax,ecx ; we subtract -1 and have 2-complement Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Power of 2 Another trick: how to check, whether a number in EAX is a power of two? mov ebx,eax ;or lea ebx,[eax - 1] dec ebx test eax,ebx jnz isnot Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Hints The processor tries to guess, whether the conditional jump will be performed. With static guess it is assumed, that the jump “backwards” will be peformed. We can help it using hints : prefixes HT(0x3e) and HNT(0x2e), for example test ecx,ecx db 3eh ;HT = we will jump jz L9 ... L9: Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Hints Sometimes holding the data in cache memory is not useful, if it is only used once Direct write instructions ( non-temporal store ) MOVNTI, MOVNTPD, etc. in write phase omit the cache. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Conservativity of compiler The C compiler must be conservative and generate code in such a way, that all possible cases are covered. Example: void memclr (char *data, int n) { for (; n > 0; n--) *data++ = 0; } If the compiler knew something about the alignment of data , it could generate a code to zero 2, 4 or ever 8 bajtów in one step. However, it must assume the worst case. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Conservativity of compiler There a few elements in C/C++, which are classic examples of slowing down programs. The group is lead by the conversion ( cast ) from real number to integer, for example int i; float f; ... i = (int)f; Such conversion takes 50-100 processor cycles. Reason: the C/C++ defines a different way of rounding than implemented in FPU, so we have to toggle coprocessor mode. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Conservativity of compiler Other nomination to Oscara prize is pointer aliasing . In the code below a compiler will not pull the evaluation of *p + 2 befor the loop void Func1 (int a[], int *p) { int i; for (i = 0; i < 100; i++) a[i] = *p + 2; } And it is right, because (hooray for C and C++ :-) void Func2() { int list[100]; Func1(list, &list[8]); } Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Conservativity of compiler Sometimes the recipes are simple. The code below twice fetches arg1->p1 from the memory: struct S1 int p1; struct S2 int p2, p3; void f1 (struct S1 *arg1, struct S2 *arg2) arg2->p2 += arg1->p1; arg2->p3 += arg1->p1; It must work this way, because arg2->p2 and arg1->p1 may be the same memory cell. But it is enough to introduce local variable bound to S1->p1 . Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Assembler Asembler allows us to take advantage from low-level services: Registers and direct input/output. Violating the compiler conventions: different passing of parameters, violating the memory allocation rules, iterative call of procedures. Linking incompatible code fragments, e.g. built by different compilers. Code optimization by hand to adapt it to a very particular hardware configuration. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Extreme example Appetizer The following code in C float a[4], b[4], c[4]; for (int i = 0; i < 4; i++) { c[i] = a[i] > b[i] ? a[i] : b[i]; } can be optimally coded as follows movaps xmm0,[a] ;Load a vector maxps xmm0,[b] ;max(a,b) movaps [c],xmm0 ;c = a > b ? a : b Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Not enough registers or “two in one” We have two variables index and increment , both 16-bit ( short ). On ARM they can pe put into one register, index at the top. Then the C code elem = tab[index]; index += increment; could be written in assembler as LDRB Relem, [Rtab, Rindincr, LSR#16] ADD Rindincr, Rindincr, Rindincr, LSL#16 Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Intel/AMD The instruction set of CISC processors (x86) is not optimal — confirmed by several changes of architecture philosophy. It must be preserved because of back compatibility with systems from years 1980s, when RAM and disc memory were small and costly. But CISC also has some advantages. The compactness of code fits well to requirements of cache memories with restricted sizes. The main problem of x86 processors is lack of enough registers, alleviated a little when designing x86-64. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Graphics accelerators Demading graphic applications need platforms with graphics coprocessor or accelerator card. The computational power contained in them can be used also to other tasks, but this is another story (and it depends much on hardware). Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

64-bit code Advantages: More registers: usually no need to store variables and intermediate result in RAM memory. The efficient procedure call: passing parameters in registers. 64-bitowe registers for integers. Better management of large memory blocks. Built-in restricted SIMD (SSE). Relative addressing of data, efficient relocatable code. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

64-bit code Disadvantages: Twice larger addresses and stack positions: troubles with cache memory. The access to static and global arrays requires more instructions for large memory images. Mostly for Windows and Mac. More complicated computation of effective memory address when the size greater than 2GB. Some instructions are longer. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Intrinsic functions in C++ New approach for joining code from different levels. Intrinsic functions represent known to the compiler processor instructions. Example: addition of floating-point vectors ADDPS may be written in C++ as the function _mm_add_ps . We can also define the appropriate class of vectors and overlod the + operator in it. Intrinsic functions exist in Microsoft, Intela and GNU compilers. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Examining compiled code Various reasons: Checking for evident places for rewriting by hand in assembly language (or for switching compiler flag, e.g. -O3 ;-) Use compiler as an intelligent typist, and the resulting code as more comfortable base than staring form nothing. This code at least has correct interfaces with environment, and they give us usually most troubles. And sometimes we will discover an error in compiler. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Examining compiled code Let us look at the loop for (int i = 0; i <= 15; i++) T[i] := i; The compiler should logically replace it by for (int i = 15; i >= 0; i--) T[i] := i; Reason: we save at a comparison instruction (with 15), because subtraction already set zero flag. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization

Assembly Language Programming Optimization Zbigniew Jurkiewicz, - PowerPoint PPT Presentation

Assembly Language Programming Optimization Zbigniew Jurkiewicz, Instytut Informatyki UW December 9, 2017 Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization Conditional transfer Sometimes we make comparison

Assembly Language Programming Assembler and assembly language Zbigniew Jurkiewicz, Instytut

Assembly Language CS2253 Owen Kaser, UNBSJ Assembly Language Some insane machine-code

Assembly Language Introduction Learning Objectives Explain what assembly language is

FROM SYSTEM F TO TYPED ASSEMBLY LANGUAGE Greg Morrisett, David Walker, Karl Crary & Neal

Overview of Assembly Language Chapter 9 S. Dandamudi Outline Assembly language

Assembly Language for Intel- -Based Based Assembly Language for Intel th Edition Computers, 4 th

Assembly Language Assembly Language: Human Readable Machine Language Computers like ones and

Assembly Language Assembler translates the assembly language source into binary instructions in

#join Y assembly to Box JellyBox Build: 15_Y-Assembly Join (link directly to the y assembly part

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Assembly Language Programming 64-bit environments Zbigniew Jurkiewicz, Instytut Informatyki UW

Assembly Language Programming Introduction to ARM Zbigniew Jurkiewicz, Instytut Informatyki UW

Assembly Language Programming Processor architecture Zbigniew Jurkiewicz, Instytut Informatyki UW

Assembly Language Programming Linkers Zbigniew Jurkiewicz, Instytut Informatyki UW November 14,

Programming in Assembly Language Minimal Program Move CS Basics Flags Increment

Assembly Language Programming Cracking and security Zbigniew Jurkiewicz, Instytut Informatyki UW

CONTROL INSTRUCTIONS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of

Lecture 14 Return-oriented programming Stephen Checkoway Oberlin College Based on slides by

Binarylevel program analysis: Static Disassembly Gang Tan CSE 597 Spring 2019 Penn State

PROGRAMMING IN HASKELL Chapter 4 - Defining Functions 1 Conditional Expressions As in most

s strt

CMSC 430 Introduction to Compilers Spring 2016 Intermediate Representations and Bytecode

Everybody be cool, this is a roppery! Vincenzo Iozzo (vincenzo.iozzo@zynamics.com) zynamics GmbH (

Control Structures 1 / 34 Control Flow Issues Multiple vs. single entry ("How did we get

Assembly Language Programming Optimization Zbigniew Jurkiewicz, - PowerPoint PPT Presentation

Assembly Language Programming Optimization Zbigniew Jurkiewicz, Instytut Informatyki UW December 9, 2017 Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Optimization Conditional transfer Sometimes we make comparison

Assembly Language Programming Assembler and assembly language Zbigniew Jurkiewicz, Instytut

Assembly Language CS2253 Owen Kaser, UNBSJ Assembly Language Some insane machine-code

Assembly Language Introduction Learning Objectives Explain what assembly language is

FROM SYSTEM F TO TYPED ASSEMBLY LANGUAGE Greg Morrisett, David Walker, Karl Crary &amp; Neal

Overview of Assembly Language Chapter 9 S. Dandamudi Outline Assembly language

Assembly Language for Intel- -Based Based Assembly Language for Intel th Edition Computers, 4 th

Assembly Language Assembly Language: Human Readable Machine Language Computers like ones and

Assembly Language Assembler translates the assembly language source into binary instructions in

#join Y assembly to Box JellyBox Build: 15_Y-Assembly Join (link directly to the y assembly part

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Assembly Language Programming 64-bit environments Zbigniew Jurkiewicz, Instytut Informatyki UW

Assembly Language Programming Introduction to ARM Zbigniew Jurkiewicz, Instytut Informatyki UW

Assembly Language Programming Processor architecture Zbigniew Jurkiewicz, Instytut Informatyki UW

Assembly Language Programming Linkers Zbigniew Jurkiewicz, Instytut Informatyki UW November 14,

Programming in Assembly Language Minimal Program Move CS Basics Flags Increment

Assembly Language Programming Cracking and security Zbigniew Jurkiewicz, Instytut Informatyki UW

CONTROL INSTRUCTIONS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of

Lecture 14 Return-oriented programming Stephen Checkoway Oberlin College Based on slides by

Binarylevel program analysis: Static Disassembly Gang Tan CSE 597 Spring 2019 Penn State

PROGRAMMING IN HASKELL Chapter 4 - Defining Functions 1 Conditional Expressions As in most

s strt

CMSC 430 Introduction to Compilers Spring 2016 Intermediate Representations and Bytecode

Everybody be cool, this is a roppery! Vincenzo Iozzo (vincenzo.iozzo@zynamics.com) zynamics GmbH (

Control Structures 1 / 34 Control Flow Issues Multiple vs. single entry (&quot;How did we get

FROM SYSTEM F TO TYPED ASSEMBLY LANGUAGE Greg Morrisett, David Walker, Karl Crary & Neal

Control Structures 1 / 34 Control Flow Issues Multiple vs. single entry ("How did we get