[PPT] - Machine numbers: how floating point numbers are stored? PowerPoint Presentation

SLIDE 1

Machine numbers: how floating point numbers are stored?

SLIDE 2

Floating-point number representation

What do we need to store when representing floating point numbers in a computer?

! = ± 1. & × 2!

Initially, different floating-point representations were used in computers, generating inconsistent program behavior across different machines. Around 1980s, computer manufacturers started adopting a standard representation for floating-point number: IEEE (Institute of Electrical and Electronics Engineers) 754 Standard.

SLIDE 3

Floating-point number representation

Numerical form:

! = ± 1. & × 2!

Representation in memory:

! =

sign exponent significand

ME [ L , U]

ME [

4 , 4]

I

m

f

C = mtsh

Unsigned intl signed

SLIDE 4

Precisions:

IEEE-754 Single precision (32 bits): IEEE-754 Double precision (64 bits):

! = ! =

Finite representation: not all numbers can be represented exactly!

! = ± 1. & × 2!"#$%&'

C - mtshift

I

c

f

lbit

8 bits

23 bits

I

11 bits

52 bits ①

lbit

C

SLIDE 5

IEEE-754 Single Precision (32-bit)

! = (−1)) 1. ( × 2*

sign (1-bit) exponent (8-bit) significand (23-bit)

#

)

$

* = + − )-.&/

I ,

I

⑦

123)

0fpE3→

(00000000)z_- (O)go

→

(

1111111 1)z = (2552,0

Of C f 255

reserve E

, 25=5 → special

cases

I f C f 254 → If Mtshifts 254

Seat /shift=l27J→fl26fmsl27T

ME f-126,127

]

k -

SLIDE 6

IEEE-754 Single Precision (32-bit)

! = (−1)) 1. ( × 2*

67.125 = 1000011.001 ! = 1.000011001 !×2"

Example: Represent the number ! = −67.125 using IEEE Single- Precision Standard

①

Sf?

→ C-IT ⇒ Positive

→ f -D

' ⇒ Negative

00178127

(

Yo - -

C -133710

5=0

f- = 000011001000

- O
23 bits

C -410000101) ,

01000101-0001101

.. .

Its

Tbi

bits Fits

SLIDE 7

Machine epsilon (.0): is defined as the distance (gap) between 1

and the next larger floating point number. ! = (−1)) 1. ( × 2* = / = 0 + 127

IEEE-754 Single Precision (32-bit)

! " #

Smallest positive normalized FP number:
Largest positive normalized FP number:

→

23 P)

=

( t) ,o=

I .

×

0.000 .

. . . 04×20

23 bits = '

"Fm'En

UFL = 2h = 2-126 g fo

38

OFL

= It

' ( I - 2-P) = 2128(I - 2-24) ⇐ 1038

SLIDE 8

IEEE-754 Double Precision (64-bit)

! = (−1)) 1. ( × 2*

sign (1-bit) exponent (11-bit) significand (52-bit)

! ! = # + 1023 "

* = 0: positive sign, * = 1: negative sign Reserved exponent number for special cases: , = 00000000000 ! = 0 , = 11111111111 ! = 2047 Therefore 1 ≤ c ≤ 2046

p - 53 (

htt)

{

c = Mtshift

→

I fmtshift 52046

shift = 1023

1022 Smf 1023 →bMCFIO22,l023T#

SLIDE 9

Machine epsilon (.0): is defined as the distance (gap) between 1

and the next larger floating point number. ! = (−1)) 1. ( × 2* = / = 0 + 1023

IEEE-754 Double Precision (64-bit)

! " #

Smallest positive normalized FP number:

UFL = 2! = 2"#$%% ≈ 2.2 ×10"&$'

Largest positive normalized FP number:

OFL = 2()#(1 − 2"*) = 2#$%+(1 − 2",&) ≈ 1.8 ×10&$'

0# = 1$%& ≈ 2.2 × 10$'"

000000000000 … 000000000 0111 … 111 1 () = 000000000000 … 000000001 1 () + 4* = 0111 … 111

Em e- 2-

n

52

S

p=52

MEE

1022,1023J

I

=U= 1023

D

SLIDE 10

Normalized floating point number scale (single precision)

+∞ −∞

l

OFL
UFL

UFL

OFL

38

38

38

10
10

10-38

10

,

Zero

SLIDE 11

Special Values:

% 000 … 000 0000 … … 0000

! =

1) Zero: 2) Infinity: +∞ (* = 0) and −∞ * = 1 % 111 … 111 0000 … … 0000

! =

3) NaN: (results from operations with undefined results) % 111 … 111 ()*+ℎ-). ≠ 00 … 00

! =

! = (−1)) 1. ( × 2* = !

" #

C. =(0000

.

- O)

( =L Ill

.

II )

*

E-

23,52

Or

→¥

HE

8,11

(100

- 010)

4)

c -400

. . .

O )

f=c←¥ > → sumbndor

SLIDE 12

Normalized floating point number scale (single precision)

+∞ −∞

l

ti::

÷¥I¥

..

:*

.

1. Exam - I.fx2

'

f.

relax

O.fx2

SLIDE 13

Subnormal (or denormalized) numbers

Noticeable gap around zero, present in any floating system, due to

normalization ü The smallest possible significand is 1.00 ü The smallest possible exponent is 3

Relax the requirement of normalization, and allow the leading digit to be zero,
nly when the exponent is at its minimum (4 = 3)

# = (−1)! 0. + × 2" *

M = C

shift

C = I 0000

. . . . ) → subnor mmal= LT

SLIDE 14

Subnormal (or denormalized) numbers

IEEE-754 Single precision (32 bits): IEEE-754 Double precision (64 bits):

/ = 00000000 5 = 0 Exponent set to 0 = −126 Smallest positive subnormal FP number: / = 00000000000 5 = 0 Exponent set to 0 = −1022 Smallest positive subnormal FP number:

→ O.f x 2-

126

O. 0000

. . . 01×2-126 = 2-23×2-126, 1.4×10-45

&

vs

O .f- × 2-

1022

0.000

. . . 002×2-022=2-52×2-1022± to

324
52

SLIDE 15

Normalized floating point number scale (single precision)

+∞ −∞

÷:* .:

. .⇒÷

.

24

*gradual

underflow

126

est

et

÷:

" ""'f.. ii. precision

in

0.000

.

. 001010×2 ''"

p=4

in

SLIDE 16

Subnormal (or denormalized) numbers

PROS: More gradual underflow to zero
CONS: - Computations with subnormal numbers are often slow;
Loss of precision

Another special case:

% 0 = 000 … 000 2

! =

= (−1)! 0. 0 × 2/

Note that this is a special case, and the exponent # is not evaluated as # = ) − +,-./ = −+,-./. Instead, the exponent is set to the lower bound, # = 1

SLIDE 17

IEEE-754 Double Precision

SLIDE 18

Stored binary exponent (/) Significand fraction (4) value 00000000 0000…0000 zero 00000000 567 4 ≠ 0 (−1)) 0. ( × 26789 00000001 567 4 (−1)) 1. ( × 26789 11111110 567 4 (−1)) 1. ( × 278: 11111111 567 4 ≠ 0 NaN 11111111 0000…0000 infinity ! = (−1)) 1. ( × 2* = 0 = / − 127

! " #

Summary for Single Precision

⋮ ⋮ ⋮