Machine numbers: how floating point numbers are stored? - - PowerPoint PPT Presentation

machine numbers how floating point numbers are stored
SMART_READER_LITE
LIVE PREVIEW

Machine numbers: how floating point numbers are stored? - - PowerPoint PPT Presentation

Machine numbers: how floating point numbers are stored? Floating-point number representation What do we need to store when representing floating point numbers in a computer? ! = 1. & 2 ! Initially, different floating-point


slide-1
SLIDE 1

Machine numbers: how floating point numbers are stored?

slide-2
SLIDE 2

Floating-point number representation

What do we need to store when representing floating point numbers in a computer?

! = ± 1. & × 2!

Initially, different floating-point representations were used in computers, generating inconsistent program behavior across different machines. Around 1980s, computer manufacturers started adopting a standard representation for floating-point number: IEEE (Institute of Electrical and Electronics Engineers) 754 Standard.

slide-3
SLIDE 3

Floating-point number representation

Numerical form:

! = ± 1. & × 2!

Representation in memory:

! =

sign exponent significand

ME [ L , U]

ME [

  • 4 , 4]

I

m

f

C = mtsh

Unsigned intl signed

slide-4
SLIDE 4

Precisions:

IEEE-754 Single precision (32 bits): IEEE-754 Double precision (64 bits):

! = ! =

Finite representation: not all numbers can be represented exactly!

! = ± 1. & × 2!"#$%&'

C - mtshift

  • I

c

f

lbit

8 bits

23 bits

I

11 bits

52 bits ①

lbit

C

slide-5
SLIDE 5

IEEE-754 Single Precision (32-bit)

! = (−1)) 1. ( × 2*

sign (1-bit) exponent (8-bit) significand (23-bit)

#

)

$

* = + − )-.&/

I ,

I

123)

0fpE3→

(00000000)z_- (O)go

(

1111111 1)z = (2552,0

Of C f 255

reserve E

, 25=5 → special

cases

I f C f 254 → If Mtshifts 254

Seat /shift=l27J→fl26fmsl27T

ME f-126,127

]

k -

slide-6
SLIDE 6

IEEE-754 Single Precision (32-bit)

! = (−1)) 1. ( × 2*

67.125 = 1000011.001 ! = 1.000011001 !×2"

Example: Represent the number ! = −67.125 using IEEE Single- Precision Standard

Sf?

→ C-IT ⇒ Positive

→ f -D

' ⇒ Negative

00178127

(

Yo - -

C -133710

5=0

f- = 000011001000

  • - O
  • 23 bits

C -410000101) ,

01000101-0001101

.. .

Its

Tbi

  • bits Fits
slide-7
SLIDE 7
  • Machine epsilon (.0): is defined as the distance (gap) between 1

and the next larger floating point number. ! = (−1)) 1. ( × 2* = / = 0 + 127

IEEE-754 Single Precision (32-bit)

! " #

  • Smallest positive normalized FP number:
  • Largest positive normalized FP number:

  • 23 P)

=

( t) ,o=

I .

×

0.000 .

. . . 04×20

23 bits = '

"Fm'En

UFL = 2h = 2-126 g fo

  • 38

OFL

= It

' ( I - 2-P) = 2128(I - 2-24) ⇐ 1038

slide-8
SLIDE 8

IEEE-754 Double Precision (64-bit)

! = (−1)) 1. ( × 2*

sign (1-bit) exponent (11-bit) significand (52-bit)

! ! = # + 1023 "

* = 0: positive sign, * = 1: negative sign Reserved exponent number for special cases: , = 00000000000 ! = 0 , = 11111111111 ! = 2047 Therefore 1 ≤ c ≤ 2046

p - 53 (

htt)

{

c = Mtshift

I fmtshift 52046

shift = 1023

  • 1022 Smf 1023 →bMCFIO22,l023T#
slide-9
SLIDE 9
  • Machine epsilon (.0): is defined as the distance (gap) between 1

and the next larger floating point number. ! = (−1)) 1. ( × 2* = / = 0 + 1023

IEEE-754 Double Precision (64-bit)

! " #

  • Smallest positive normalized FP number:

UFL = 2! = 2"#$%% ≈ 2.2 ×10"&$'

  • Largest positive normalized FP number:

OFL = 2()#(1 − 2"*) = 2#$%+(1 − 2",&) ≈ 1.8 ×10&$'

0# = 1$%& ≈ 2.2 × 10$'"

000000000000 … 000000000 0111 … 111 1 () = 000000000000 … 000000001 1 () + 4* = 0111 … 111

Em e- 2-

n

n

  • 52

S

p=52

MEE

  • 1022,1023J

I

=U= 1023

D

D

slide-10
SLIDE 10

Normalized floating point number scale (single precision)

+∞ −∞

l

l

l

l

  • OFL
  • UFL

UFL

OFL

38

  • 38

38

  • 10
  • 10

10-38

10

,

Zero

slide-11
SLIDE 11

Special Values:

% 000 … 000 0000 … … 0000

! =

1) Zero: 2) Infinity: +∞ (* = 0) and −∞ * = 1 % 111 … 111 0000 … … 0000

! =

3) NaN: (results from operations with undefined results) % 111 … 111 ()*+ℎ-). ≠ 00 … 00

! =

! = (−1)) 1. ( × 2* = !

" #

  • C. =(0000
.
  • - O)

( =L Ill

.
  • II )

*

E-

23,52

Or

→¥

HE

  • 8,11

(100

  • - 010)

4)

c -400

. . .
  • O )

f=c←¥ > → sumbndor

slide-12
SLIDE 12

Normalized floating point number scale (single precision)

+∞ −∞

l

l

ti::

÷¥I¥

..

:*

.

  • 1. Exam - I.fx2

'

f.

relax

O.fx2

slide-13
SLIDE 13

Subnormal (or denormalized) numbers

  • Noticeable gap around zero, present in any floating system, due to

normalization ü The smallest possible significand is 1.00 ü The smallest possible exponent is 3

  • Relax the requirement of normalization, and allow the leading digit to be zero,
  • nly when the exponent is at its minimum (4 = 3)

# = (−1)! 0. + × 2" *

M = C

  • shift

C = I 0000

. . . . ) → subnor mmal= LT
slide-14
SLIDE 14

Subnormal (or denormalized) numbers

IEEE-754 Single precision (32 bits): IEEE-754 Double precision (64 bits):

/ = 00000000 5 = 0 Exponent set to 0 = −126 Smallest positive subnormal FP number: / = 00000000000 5 = 0 Exponent set to 0 = −1022 Smallest positive subnormal FP number:

→ O.f x 2-

126

  • O. 0000
. . . 01×2-126 = 2-23×2-126, 1.4×10-45

&

vs

O .f- × 2-

1022

0.000

. . . 002×2-022=2-52×2-1022± to
  • 324
  • 52
slide-15
SLIDE 15

Normalized floating point number scale (single precision)

+∞ −∞

÷:* .:

. .⇒÷

.

24

*gradual

underflow

  • 126

est

et

÷:

" ""'f.. ii. precision

in

0.000

  • .
. 001010×2 ''"

p=4

in

slide-16
SLIDE 16

Subnormal (or denormalized) numbers

  • PROS: More gradual underflow to zero
  • CONS: - Computations with subnormal numbers are often slow;
  • Loss of precision

Another special case:

% 0 = 000 … 000 2

! =

  • = (−1)! 0. 0 × 2/

Note that this is a special case, and the exponent # is not evaluated as # = ) − +,-./ = −+,-./. Instead, the exponent is set to the lower bound, # = 1

slide-17
SLIDE 17

IEEE-754 Double Precision

slide-18
SLIDE 18

Stored binary exponent (/) Significand fraction (4) value 00000000 0000…0000 zero 00000000 567 4 ≠ 0 (−1)) 0. ( × 26789 00000001 567 4 (−1)) 1. ( × 26789 11111110 567 4 (−1)) 1. ( × 278: 11111111 567 4 ≠ 0 NaN 11111111 0000…0000 infinity ! = (−1)) 1. ( × 2* = 0 = / − 127

! " #

Summary for Single Precision

⋮ ⋮ ⋮