real number representation
play

Real Number Representation 1 Topics Terminology IEEE standard - PowerPoint PPT Presentation

Real Number Representation 1 Topics Terminology IEEE standard for floating-point representation Floating point arithmetic Limitations 2 Terminology All digits in a number following any leading zeros are significant


  1. Real Number Representation 1

  2. Topics • Terminology • IEEE standard for floating-point representation • Floating point arithmetic • Limitations 2

  3. Terminology • All digits in a number following any leading zeros are significant digits : 12.345 -0.12345 0.00012345 3

  4. Terminology (cont) • The scientific notation for real numbers is: exponent mantissa × base In C, the expression: 12.456e-2 means: 12.456 × 10 -2 4

  5. Terminology (cont) • The mantissa is always normalized between 1 and the base (i.e., exactly one significant digit before the point) Unnormalized Normalized 2997.9 × 10 5 2.9979 × 10 8 B1.39FC × 16 11 B.139FC × 16 12 0.010110110101 × 2 -1 1.0110110101 × 2 -3 5

  6. Terminology (cont) • The precision of a number is how many digits (or bits) we use to represent it • For example: 3 3.14 3.1415926 3.1415926535897932384626433832795028 6

  7. Representing Numbers • A real number n is represented by a floating-point approximation n* • The computer uses 32 bits (or more) to store each approximation • It needs to store – the mantissa – the sign of the mantissa – the exponent (with its sign) 7

  8. Representing Numbers (cont) • The standard way to allocate 32 bits (specified by IEEE Standard 754) is: – 23 bits for the mantissa – 1 bit for the mantissa's sign – 8 bits for the exponent 0 31 30 23 22 8

  9. Representing Numbers (cont) – 23 bits for the mantissa – 1 bit for the mantissa's sign – 8 bits for the exponent 0 31 30 23 22 9

  10. Representing Numbers (cont) – 23 bits for the mantissa – 1 bit for the mantissa's sign – 8 bits for the exponent 0 31 30 23 22 10

  11. Representing Numbers (cont) – 23 bits for the mantissa – 1 bit for the mantissa's sign – 8 bits for the exponent 0 31 30 23 22 11

  12. Representing the Mantissa • The mantissa has to be in the range 1 ≤ mantissa < base • Therefore – If we use base 2, the digit before the point must be a 1 – So we don't have to worry about storing it We get 24 bits of precision using 23 bits 12

  13. Representing the Mantissa (cont) • 24 bits of precision are equivalent to a little over 7 decimal digits: 24 log 2 10 ≈ 7.2 13

  14. Representing the Mantissa (cont) • Suppose we want to represent π : 3.1415926535897932384626433832795..... • That means that we can only represent it as: 3.141592 (if we truncate) 3.141593 (if we round) 14

  15. Representing the Exponent • The exponent is represented as excess-127. E.g., Actual Exponent Stored Value ↔ -127 00000000 ↔ -126 00000001 . . . ↔ 0 01111111 ↔ +1 10000000 . . . ↔ i ( i +127) 2 . . . ↔ +128 11111111 15

  16. Representing the Exponent (cont) • The IEEE standard restricts exponents to the range: –126 ≤ exponent ≤ +127 • The exponents –127 and +128 have special meanings: – If exponent = – 127, the stored value is 0 – If exponent = 128, the stored value is ∞ 16

  17. Representing Numbers -- Example 1 What is 01011011 (8-bit machine) ? 0 101 1011 sign exp mantissa • Mantissa: 1.1011 • Exponent (excess-3 format): 5-3=2 1.1011 × 2 2 ⇒ 110.11 110.11 2 = 2 2 + 2 1 + 2 -1 + 2 -2 = 4 + 2 + 0.5 + 0.25 = 6.75 17

  18. Representing Numbers -- Example 2 Represent -10.375 (32-bit machine) 10.375 10 = 10 + 0.25 + 0.125 = 2 3 + 2 1 + 2 - 2 + 2 - 3 = 1010.011 2 ⇒ 1.010011 2 × 2 3 • Sign: 1 • Mantissa: 010011 • Exponent (excess-127 format): 3+127 = 130 10 = 10000010 2 1 10000010 01001100000000000000000 18

  19. Floating Point Overflow • Floating point representations can overflow, e.g., 1.111111 × 2 127 + 1.111111 × 2 127 11.111110 × 2 127 = ∞ 1.1111110 × 2 128 19

  20. Floating Point Underflow • Floating point numbers can also get too small , e.g., 10.010000 × 2 -126 ÷ 11.000000 × 2 0 0.110000 × 2 -126 = 0 1.100000 × 2 -127 20

  21. “Normalized” “Normalized” • Condition – exp ≠ 000 … 0 and exp ≠ 111 … 1 • Exponent coded as biased value E = Exp – Bias • Exp : unsigned value denoted by exp • Bias : Bias value – Single precision: 127 ( Exp : 1…254, E : -126…127) – Double precision: 1023 ( Exp : 1…2046, E : -1022…1023) – in general: Bias = 2 e-1 - 1, where e is number of exponent bits • Significand coded with implied leading 1 M = 1.xxx … x 2 • xxx … x : bits of frac • Minimum when 000 … 0 ( M = 1.0) • Maximum when 111 … 1 ( M = 2.0 – ε ) • Get extra leading bit for “free” 21

  22. Denormalized Values Denormalized Values • Condition – exp = 000 … 0 • Value – Exponent value E = – Bias + 1 – Significand value M = 0.xxx … x 2 • xxx … x : bits of frac • Cases – exp = 000 … 0 , frac = 000 … 0 • Represents value 0 • Note that have distinct values +0 and –0 – exp = 000 … 0 , frac ≠ 000 … 0 • Numbers very close to 0.0 • Lose precision as get smaller • “Gradual underflow” 22

  23. Special Values Special Values • Condition – exp = 111 … 1 • Cases – exp = 111 … 1 , frac = 000 … 0 • Represents value ∞ (infinity) • Operation that overflows • Both positive and negative • E.g., 1.0/0.0 = − 1.0/ − 0.0 = + ∞ , 1.0/ − 0.0 = −∞ – exp = 111 … 1 , frac ≠ 000 … 0 • Not-a-Number (NaN) • Represents case when no numeric value can be determined • E.g., sqrt(–1), ∞ − ∞ 23

  24. Floating Point Representation Most standard floating point representation use: 1 bit for the sign (positive or negative) 8 bits for the range (exponent field) 23 bits for the precision (fraction field) 1 8 23 S exponent fraction ( ) − = − × × ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent   ( )  − = − × × = S 126 exponent 1 0 . 2 , 0 N fraction exponent   24

  25. Floating Point Representation 1 8 23 S exponent fraction ( ) = − × × − ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent   ( )  = − × × − = S 126 exponent 1 0 . 2 , 0 N fraction exponent   5 − Example : How is the number 6 represente d in floating point? 8 5 4 1 1 1 − = − + + + = − + + + 6 4 2 4 2     8 8 8 2 8     ( )     − − − = − × + × + × + × + × + × 2 1 0 1 2 3 1 2 1 2 0 2 1 2 0 2 1 2 ( ) ( ) = − = − × 2 110 . 101 1 . 10101 2 2 2 Thus the exponent is given by: − = = 127 2 129 exponent exponent ⇒ 1 10000001 10101000000000000000000 25

  26. Floating Point Representation (example) 1 8 23 S exponent fraction ( ) − = − × × ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent   ( )  − = − × × = S 126 exponent 1 0 . 2 , 0 N fraction exponent   What is the decimal value of the following floating point number? 00111101100000000000000000000000 exponent exponent = 64+32+16+8+2+1=(128-8)+3=120+3=123 1 ( ) − − = − × × = × = 0 123 127 4 1 1 . 0 2 1 . 0 2 N 16 26

  27. Floating Point Representation (example) 1 8 23 S exponent fraction ( ) − = − × × ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent   ( )  − = − × × = S 126 exponent 1 0 . 2 , 0 N fraction exponent   What is the decimal value of the following floating point number? 01000001100101000000000000000000 exponent exponent =128+2+1=131 ( ) − = − × × = × = 0 131 127 4 1 1 . 00101 2 1 . 00101 2 10010 . 1 N 2 2 2 1 = + + − = + + = 4 1 1 2 2 2 16 2 18 . 5 N 2 27

  28. Floating Point Representation (example) 1 8 23 S exponent fraction ( ) − = − × × ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent   ( )  − = − × × = S 126 exponent 1 0 . 2 , 0 N fraction exponent   What is the decimal value of the following floating point number? 11000001000101000000000000000000 exponent exponent =128+2=130 ( ) − = − × × = − × = − 1 130 127 3 1 1 . 00101 2 1 . 00101 2 1001 . 01 N 2 2 2 ( ) 1 = − + + − = − + + = − 3 0 2 2 2 2 8 1 9 . 25 N   4   28  

  29. Floating Point 1 8 23 S exponent fraction ( ) = − × × − ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent   ( )  = − × × − = S 126 exponent 1 0 . 2 , 0 N fraction exponent   What is the largest number that can be represented in 32 bits floating point using the IEEE 754 format above? 01111111011111111111111111111111 exponent exponent =254 − − − − = × + × + + × + × 1 2 22 23 1 2 1 2 .... 1 2 1 2 fraction 1 1 − = × − × = − = − = 0 23 1 2 1 2 1 1 0 . 9999998807 9 fraction × × 23 2 1024 1024 8 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend