Assembly Language Programming Floating-point Computations Zbigniew - - PowerPoint PPT Presentation

assembly language programming floating point computations
SMART_READER_LITE
LIVE PREVIEW

Assembly Language Programming Floating-point Computations Zbigniew - - PowerPoint PPT Presentation

Assembly Language Programming Floating-point Computations Zbigniew Jurkiewicz, Instytut Informatyki UW November 28, 2017 Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations Representation in


slide-1
SLIDE 1

Assembly Language Programming Floating-point Computations

Zbigniew Jurkiewicz, Instytut Informatyki UW November 28, 2017

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-2
SLIDE 2

Representation in IEEE 754 Standard

Fraction (mantissa) and exponent S × 2E In a normalized number the fraction is a “fixed-point” number of the form 1.bbbbbbbbbb..., for example 1.0110001110111 × 27 Two standard IEEE formats : single precision (32 bits) and double precision (64 bits) Intel FPU additionally has extended precision (80 bits).

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-3
SLIDE 3

Classes of numbers

Classes of numbers: signed zeroes normalized denormalized finite numbers signed infinities NaNs (Not a Number : just that) indefinite numbers

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-4
SLIDE 4

Zeroes

The sign of zero lets us to find the direction of occured underflow lets us to find the sign of infinity, which a number has been divided by is useful for interval arithmetics

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-5
SLIDE 5

Normalization

Lack of normalization means less precision (smaller number of meaningful binary digits). Receiving denormalized result signals an underflow condition (#U). In Intel FPU:

floating-point underflow exception = getting denormalized result floating-point denormal-operand exception = discovery, that the operation operand is a denormalized number.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-6
SLIDE 6

Infinities

Infinities can be compared and used in arithmetic operations.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-7
SLIDE 7

NaN

Types of NaN: S = 1.0xxxxxxx SNaN (Signaling Nan) as an operand they signal floating-point invalid-operation excep- tion, they have to be created programmati- cally (the procesor does not generate them) m = 1.1xxxxxxx QNaN (Quiet NaN) are allowed (in principle) to be operand in arithmetic operations m = 1.10000000 Floating-point indefinite Usage: The compiler fills noninitialized elements of an array with NaNs containing element index.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-8
SLIDE 8

IEEE single precision

topmost bit (31) is a sign S next 8 bits (23..30) for exponent E 23 lowest bits (0..22) for fraction F a normalized fraction is always of the form 1.bbbbbbbbbb..., so we save memory by not storing the uppermost 1. an exponent is always shifted (“biased”) by 127.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-9
SLIDE 9

IEEE single precision

Special cases E = 0 & F = 0: the number 0 (depending on sign +0 or -0) E = 0 & F = 0: denormalized number E = 255 & F = 0: infinity (∞) E = 255 & F = 0: NaN (Not a Number) — indefinite result

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-10
SLIDE 10

IEEE double precision

the topmost bit (63) contains a sign 11 next bits (52..62) for exponent 52 lowest bits (0..51) for fraction

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-11
SLIDE 11

Intel technology

Initially floating-point computations had been performed on a separate coprocessor called FPU (Floating Point Unit). In newer models it has been built into the main processor. MMM technology enabled parallel comptutations on packed integer numbers (small vector processing). First in Pentium MMX and Pentium II. SSE technology was intoduced to permit similar computations on packed single-precision floating-point

  • numbers. It uses separate 128-bit registers. First in

Pentium III. SSE2 technology extends it with packed double-precision floating-point numbers and packed integer numbers of different sizes. Also some operations have been added. From Pentium 4. SSE3 only adds additional operations. From Pentium 4HT andi Xeon.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-12
SLIDE 12

Intel FPU

Once it was really a separate chip (so called “mathematical coprocessor”), now embedded, but architectural separation has been preserved in the form of separate computational environment

For example FPU instructions cannot touch normal registers (EAX etc.), because this registers are in a “different procesor”.

The separate set of registers st0, st1, ..., st7, which form the stack (with st0 on top). Operations (nearly) always use top of stack. The separate state register (“flags”), invisible for the normal processor, and the control register.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-13
SLIDE 13

State flags

Bits and masks used in two contexts: for FPU instructions

x87 FPU status word: bits 0..5 x87 FPU control word: masks 0..5

for SIMD operations in SSE/SSE2/SSE3 instructions

MXCSR register: flags in bits 0..5, masks in bits 7..12.

State flag bits are “sticky”: once set they state set until cleared by hand. So we may mask ale exceptions and look for exceptional situations after performing the whole computation sequence.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-14
SLIDE 14

Exceptions

In Intel processors there are 6 classes of exceptions with bits and masks for precomputation

Invalid operation (#I), bit IE, mask ME.

stack overflow or empty stack (#IS) illegal arithmetic operation (#IA), e.g. division ∞ by ∞ or zero by zero

Divide-by-zero (#Z), bit IZ, mask MZ etc. Denormalized operand (#D), not in IEEE standard

for postcomputation

Numeric Overflow (#O) Numeric underflow (#U) Inexact result (precision) (#P), very popular, e.g. 1/3

Setting a mask results in a default handling of the exceptions,

  • therwise the exception is raised.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-15
SLIDE 15

Instructions: load and store

FLD place Moves the contents of place to the top of the stack. Place may be also an FPU register. FILD place Fetches the integer number from memory, changes to floating-point format and pushes on stack. FLD1 Pushes number 1 on the top of stack. FLDZ Pushes zero on the top of stack. FXCH stn Swaps the contents of a given register and the top of stack.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-16
SLIDE 16

Instructions: load and store

FST place Stores the contents of the top of stack into the given place (which may be also another FPU register. FSTP place The same, but with popping the stack. FIST place Converts the number from the top of stack into 2 or 4-byte long integer and stores in a given place. FISTP place The same, but

with popping the stack; it is possible to get 8-byte long number.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-17
SLIDE 17

Instructions: arithmetics

FADD place Adds the contents of place to the top of stack (st0). FADD register,st0 Adds the contents of the top of stack (st0) to the register. FADDP register,st0 The same, but with popping the stack. FIADD place Adds the integer number from place to the stack top (st0).

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-18
SLIDE 18

Example [Carter]

Summing the array:

SIZE equ 10 section .bss array resq SIZE sum resq 1 section .text mov ecx, SIZE mov esi, array fldz ;initialize st0 lp: fadd qword [esi] ;next element add esi, 8 ;step loop lp fstp qword sum ;store the result

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-19
SLIDE 19

Instructions: arithmetics

For subtraction we have twice more instructions, because it is not the commutative operation, examples: FSUB place Subtracts the contents of place from the stack top (st0). FSUBR register,st0 Subtracts the contents of the stack top (st0) from the register, the result is in st0. FSUBR TO register Same, but the result is in register.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-20
SLIDE 20

Instructions: arithmetics and comparisons

Multiplication and division are analogous to addition and subtraction. FCOM place Compares place with the stack top (st0). FCOMP place The same with popping the stack. FCOMPP Compares st0 with st1 and pops both from the stack. FTST Compares st0 with zero.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-21
SLIDE 21

Comparison instructions

Main processor’s conditional instructions do not look into FPU state register, so we should first copy flags to EFLAGS. FSTSW place Saves state register in a given place, usually in a register AX. Then we can use SAHF instruction to move flags to EFLAGS. For conditional jumps we should use JA, JB and JZ (in

  • ther words, we treat floating-point numbers like unsigned

integers).

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-22
SLIDE 22

Comparison instructions: example [Carter]

;;; if (x > y) fld qword [x] fcomp qword [y] fstsw ax ;moving the state register contents sahf jna else_part then_part: ;code for then part jmp end_if else_part: ;code for else part end_if:

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-23
SLIDE 23

Comparison instructions

From Pentium Pro there are two new instructions directly modifying EFLAGS, but they only operate on registers. COMI register Compares a register with st0. COMIP register The same with popping the stack.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-24
SLIDE 24

Comparison instructions: example

global dmax %define a1 ebp+8 %define a2 ebp+16 section .text dmax: push ebp mov ebp,esp fld qword [a1] fld qword [a2] fcomi st1 ja a2_less fxchg st1 a2_less: pop ebp ret

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-25
SLIDE 25

“Mathematical” instructions

FSQRT Replaces the top of stack (st0) by its square root. FSIN, FCOS, FPTAN The same for functions sin, cos and tan. The argument is in radians! FPTAN Computes tan of stack top (st0) and replaces it, and then pushes 1.0 to the stack (a souvenir of the times, when there was only FPTAN, and not FSIN nor FCOS). FLDPI, FLDL2E, FLDLN2 They push on stack π, log2 e, ln 2. “Strange” logarithms.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-26
SLIDE 26

Synchronization

The main processor (“integer”) and FPU unit are separate computing environments. They can work in parallel, i.e. during the execution of floating-point operation, if possible, “normal” instructions are concurrently performed. Problem of handling the floating-point exceptions: the memory cells which caused them could be already

  • verwritten by normal instructions.

This makes really hard for handlers to analyze the situation and repair it.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-27
SLIDE 27

Synchronization: FWAIT

The instruction FWAIT was introduced to be put in the code directly after the instruction, which could cause an

  • exception. FWAIT blocks temporarily the execution of

further instructions. Often we can have the same effect simply by reordering instructions, for example the sequence

fild [count] inc [count] fsqrt

is changed to

fild [count] fsqrt inc [count]

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations

slide-28
SLIDE 28

MMX

The extension MMX of Pentium processors enables the concurrent execution of the same operation on many combinations of arguments. It is like a miniature vector processor. Another interesting property of MMX is using the saturation arithmetics — after overflowing the range the result is the largest representable value (there is no “wrap”). This method is useful sometimes for audiovisual data. Why do we talk about it here? The work in MMX mode excludes floating-point operations i vice versa, because they are using the same registers (only they call them differently). It is necessary to explicitly switch between modes. So MMX is a parasite.

Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming Floating-point Computations