Chapter 2: Roundoff Errors Uri M. Ascher and Chen Greif Department - - PowerPoint PPT Presentation

chapter 2 roundoff errors
SMART_READER_LITE
LIVE PREVIEW

Chapter 2: Roundoff Errors Uri M. Ascher and Chen Greif Department - - PowerPoint PPT Presentation

September 2, 2013 Chapter 2: Roundoff Errors Uri M. Ascher and Chen Greif Department of Computer Science The University of British Columbia { ascher,greif } @cs.ubc.ca Slides for the book A First Course in Numerical Methods (published by SIAM,


slide-1
SLIDE 1

September 2, 2013

Chapter 2: Roundoff Errors

Uri M. Ascher and Chen Greif Department of Computer Science The University of British Columbia

{ascher,greif}@cs.ubc.ca

Slides for the book A First Course in Numerical Methods (published by SIAM, 2011) http://www.ec-securehost.com/SIAM/CS07.html

slide-2
SLIDE 2

Roundoff Errors Goals

Goals of this chapter

  • To describe how numbers are stored in a floating point system;
  • to get a feeling for the almost random nature of rounding error;
  • to identify different sources of roundoff error growth and explain how to

dampen their cummulative effect.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 1 / 1

slide-3
SLIDE 3

Roundoff Errors Outline

Outline

  • The essentials (We will do only this)
  • Floating point systems
  • Roundoff error accumulation
  • The IEEE standard

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 2 / 1

slide-4
SLIDE 4

Roundoff Errors Motivation

Roundoff Errors

  • Roundoff error is generally inevitable in numerical algorithms involving real

numbers.

  • People often like to pretend they work with exact real numbers, ignoring

roundoff errors, which may allow concentration on other algorithmic aspects.

  • However, carelessness may lead to disaster!
  • This chapter provides two options for studying roundoff errors:
  • The essentials: just enough to know what issues to expect.
  • In this course we will take this option.
  • The fuller version.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 3 / 1

slide-5
SLIDE 5

Roundoff Errors Motivation

Roundoff Errors

  • Roundoff error is generally inevitable in numerical algorithms involving real

numbers.

  • People often like to pretend they work with exact real numbers, ignoring

roundoff errors, which may allow concentration on other algorithmic aspects.

  • However, carelessness may lead to disaster!
  • This chapter provides two options for studying roundoff errors:
  • The essentials: just enough to know what issues to expect.
  • In this course we will take this option.
  • The fuller version.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 3 / 1

slide-6
SLIDE 6

Roundoff Errors Motivation

Roundoff Errors

  • Roundoff error is generally inevitable in numerical algorithms involving real

numbers.

  • People often like to pretend they work with exact real numbers, ignoring

roundoff errors, which may allow concentration on other algorithmic aspects.

  • However, carelessness may lead to disaster!
  • This chapter provides two options for studying roundoff errors:
  • The essentials: just enough to know what issues to expect.
  • In this course we will take this option.
  • The fuller version.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 3 / 1

slide-7
SLIDE 7

Roundoff Errors The essentials

The essentials

We will consider:

  • Real number representation – floating point system
  • Rounding unit
  • IEEE standard
  • Roundoff error accumulation
  • Rough appearance of roundoff errors

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 4 / 1

slide-8
SLIDE 8

Roundoff Errors The essentials

Real number representation: decimal

8 3 ≃ ( 2 100 + 6 101 + 6 102 + 6 103 ) × 100 = 2.666 × 100. An instance of the floating point representation fl(x) = ±d0.d1 · · · dt−1 × 10e = ± ( d0 100 + d1 101 + · · · + dt−2 10t−2 + dt−1 10t−1 ) × 10e for t = 4, e = 0. Note that d0 > 0: normalized floating point representation.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 5 / 1

slide-9
SLIDE 9

Roundoff Errors The essentials

Real number representation: decimal

8 3 ≃ ( 2 100 + 6 101 + 6 102 + 6 103 ) × 100 = 2.666 × 100. An instance of the floating point representation fl(x) = ±d0.d1 · · · dt−1 × 10e = ± ( d0 100 + d1 101 + · · · + dt−2 10t−2 + dt−1 10t−1 ) × 10e for t = 4, e = 0. Note that d0 > 0: normalized floating point representation.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 5 / 1

slide-10
SLIDE 10

Roundoff Errors The essentials

Real number representation: decimal

8 3 ≃ ( 2 100 + 6 101 + 6 102 + 6 103 ) × 100 = 2.666 × 100. An instance of the floating point representation fl(x) = ±d0.d1 · · · dt−1 × 10e = ± ( d0 100 + d1 101 + · · · + dt−2 10t−2 + dt−1 10t−1 ) × 10e for t = 4, e = 0. Note that d0 > 0: normalized floating point representation.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 5 / 1

slide-11
SLIDE 11

Roundoff Errors The essentials

Real number representation: binary

The decimal system is convenient for humans; but computers prefer binary.

  • In binary the (normalized) representation of a real number x is

x = ±(1.d1d2d3 · · · dt−1dtdt+1 · · · ) × 2e = ±(1 + d1 2 + d2 4 + d3 8 + · · · ) × 2e, with binary digits di = 0 or 1 and exponent e.

  • Floating point representation: with a fixed number of digits t

fl(x) = ±(1. ˜ d1 ˜ d2 ˜ d3 · · · ˜ dt−1 ˜ dt) × 2e

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 6 / 1

slide-12
SLIDE 12

Roundoff Errors The essentials

Real number representation: binary

The decimal system is convenient for humans; but computers prefer binary.

  • In binary the (normalized) representation of a real number x is

x = ±(1.d1d2d3 · · · dt−1dtdt+1 · · · ) × 2e = ±(1 + d1 2 + d2 4 + d3 8 + · · · ) × 2e, with binary digits di = 0 or 1 and exponent e.

  • Floating point representation: with a fixed number of digits t

fl(x) = ±(1. ˜ d1 ˜ d2 ˜ d3 · · · ˜ dt−1 ˜ dt) × 2e

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 6 / 1

slide-13
SLIDE 13

Roundoff Errors The essentials

Determining digits

How to determine digits ˜ di? Rounding: fl(x) = { ± 1.d1d2d3 · · · dt × 2e dt+1 = 0 to nearest even

  • therwise .

Then the relative floating point error is bounded by rounding unit |fl(x) − x| |x| ≤ 1 2 · 2−t. Recommendation: prove this important bound!

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 7 / 1

slide-14
SLIDE 14

Roundoff Errors The essentials

Determining digits

How to determine digits ˜ di? Rounding: fl(x) = { ± 1.d1d2d3 · · · dt × 2e dt+1 = 0 to nearest even

  • therwise .

Then the relative floating point error is bounded by rounding unit |fl(x) − x| |x| ≤ 1 2 · 2−t. Recommendation: prove this important bound!

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 7 / 1

slide-15
SLIDE 15

Roundoff Errors The essentials

Determining digits

How to determine digits ˜ di? Rounding: fl(x) = { ± 1.d1d2d3 · · · dt × 2e dt+1 = 0 to nearest even

  • therwise .

Then the relative floating point error is bounded by rounding unit |fl(x) − x| |x| ≤ 1 2 · 2−t. Recommendation: prove this important bound!

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 7 / 1

slide-16
SLIDE 16

Roundoff Errors The essentials

Determining digits

How to determine digits ˜ di? Rounding: fl(x) = { ± 1.d1d2d3 · · · dt × 2e dt+1 = 0 to nearest even

  • therwise .

Then the relative floating point error is bounded by rounding unit |fl(x) − x| |x| ≤ 1 2 · 2−t. Recommendation: prove this important bound!

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 7 / 1

slide-17
SLIDE 17

Roundoff Errors The essentials

IEEE standard word

Double precision (64 bit word) s = ± b =11-bit exponent f =52-bit fraction Rounding unit: η = 1 2 · 2−52 ≈ 1.1 × 10−16 Can have also single precision (32 bit word). Then t = 23 and η = 2−24 ≈ 6.0 × 10−8 .

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 8 / 1

slide-18
SLIDE 18

Roundoff Errors The essentials

IEEE standard word

Double precision (64 bit word) s = ± b =11-bit exponent f =52-bit fraction Rounding unit: η = 1 2 · 2−52 ≈ 1.1 × 10−16 Can have also single precision (32 bit word). Then t = 23 and η = 2−24 ≈ 6.0 × 10−8 .

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 8 / 1

slide-19
SLIDE 19

Roundoff Errors The essentials

IEEE standard word

Double precision (64 bit word) s = ± b =11-bit exponent f =52-bit fraction Rounding unit: η = 1 2 · 2−52 ≈ 1.1 × 10−16 Can have also single precision (32 bit word). Then t = 23 and η = 2−24 ≈ 6.0 × 10−8 .

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 8 / 1

slide-20
SLIDE 20

Roundoff Errors The essentials

Comparing single and double precision

If we represent the number 1/3 in IEEE single precision (32 bits), the error will be approximately how many times larger than if we represent the same number in IEEE double precision (64 bits)?

  • 229 ≈ 5.37(108)
  • 32
  • a little over 2
  • 1 (the error will be the same)

A B C D

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 9 / 1

slide-21
SLIDE 21

Roundoff Errors The essentials

IEEE standard

  • Used by everyone today.
  • Exact rounding: use guard digits to ensure that relative error in each

elementary arithmetic operation is bounded by η.

  • NaN
  • Overflow and underflow
  • Subnormal numbers near 0.
  • Many other features...

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 10 / 1

slide-22
SLIDE 22

Roundoff Errors The essentials

Roundoff error accumulation

  • In general, if En is error after n elementary operations, cannot avoid linear

roundoff error accumulation En ≃ c0nE0.

  • Will not tolerate an expoential error growth such as

En ≃ cn

1E0

for some constant c1 > 1 – an unstable algorithm.

  • In some situations an individual error contribution is particularly large and
  • ccasionally can be made smaller.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 11 / 1

slide-23
SLIDE 23

Roundoff Errors The essentials

Roundoff error accumulation

  • In general, if En is error after n elementary operations, cannot avoid linear

roundoff error accumulation En ≃ c0nE0.

  • Will not tolerate an expoential error growth such as

En ≃ cn

1E0

for some constant c1 > 1 – an unstable algorithm.

  • In some situations an individual error contribution is particularly large and
  • ccasionally can be made smaller.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 11 / 1

slide-24
SLIDE 24

Roundoff Errors The essentials

Cancellation error

When two nearby numbers are subtracted, the relative error is large. Naturally occurs in practice. Instance:

  • If g(·) is a smooth function then g(t) and g(t + h) are close for h small.
  • But rounding errors in g(t) and g(t + h) are unrelated, so they can be of
  • pposing signs!
  • Recall numerical differentiation example from Chapter 1: if the relative error

in the representation is bounded by η then in |g(t + h) − g(t)/h it is bounded by 2η/h. This (tight) bound is much larger than η when h is small.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 12 / 1

slide-25
SLIDE 25

Roundoff Errors The essentials

Cancellation error

When two nearby numbers are subtracted, the relative error is large. Naturally occurs in practice. Instance:

  • If g(·) is a smooth function then g(t) and g(t + h) are close for h small.
  • But rounding errors in g(t) and g(t + h) are unrelated, so they can be of
  • pposing signs!
  • Recall numerical differentiation example from Chapter 1: if the relative error

in the representation is bounded by η then in |g(t + h) − g(t)/h it is bounded by 2η/h. This (tight) bound is much larger than η when h is small.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 12 / 1

slide-26
SLIDE 26

Roundoff Errors The essentials

Cancellation error

When two nearby numbers are subtracted, the relative error is large. Naturally occurs in practice. Instance:

  • If g(·) is a smooth function then g(t) and g(t + h) are close for h small.
  • But rounding errors in g(t) and g(t + h) are unrelated, so they can be of
  • pposing signs!
  • Recall numerical differentiation example from Chapter 1: if the relative error

in the representation is bounded by η then in |g(t + h) − g(t)/h it is bounded by 2η/h. This (tight) bound is much larger than η when h is small.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 12 / 1

slide-27
SLIDE 27

Roundoff Errors The essentials

Example

Compute y = sinh(x) = 1

2(ex − e−x).

  • Naively computing y at an x near 0 may result in a (meaningless) 0.
  • Instead use Taylor’s expansion

ex = 1 + x + x2 2 + x3 6 + . . . to obtain sinh(x) = x + x3 6 + . . . .

  • If x is near 0, can use x + x3

6 , or even just x, for an effective approximation

to sinh(x). So, a good library function would compute sinh(x) by the regular formula (using exponentials) for |x| not very small, and by taking a term or two of the Taylor expansion for |x| very small.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 13 / 1

slide-28
SLIDE 28

Roundoff Errors The essentials

Example

Compute y = sinh(x) = 1

2(ex − e−x).

  • Naively computing y at an x near 0 may result in a (meaningless) 0.
  • Instead use Taylor’s expansion

ex = 1 + x + x2 2 + x3 6 + . . . to obtain sinh(x) = x + x3 6 + . . . .

  • If x is near 0, can use x + x3

6 , or even just x, for an effective approximation

to sinh(x). So, a good library function would compute sinh(x) by the regular formula (using exponentials) for |x| not very small, and by taking a term or two of the Taylor expansion for |x| very small.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 13 / 1

slide-29
SLIDE 29

Roundoff Errors The essentials

Example

Compute y = sinh(x) = 1

2(ex − e−x).

  • Naively computing y at an x near 0 may result in a (meaningless) 0.
  • Instead use Taylor’s expansion

ex = 1 + x + x2 2 + x3 6 + . . . to obtain sinh(x) = x + x3 6 + . . . .

  • If x is near 0, can use x + x3

6 , or even just x, for an effective approximation

to sinh(x). So, a good library function would compute sinh(x) by the regular formula (using exponentials) for |x| not very small, and by taking a term or two of the Taylor expansion for |x| very small.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 13 / 1

slide-30
SLIDE 30

Roundoff Errors The essentials

Example

Compute y = sinh(x) = 1

2(ex − e−x).

  • Naively computing y at an x near 0 may result in a (meaningless) 0.
  • Instead use Taylor’s expansion

ex = 1 + x + x2 2 + x3 6 + . . . to obtain sinh(x) = x + x3 6 + . . . .

  • If x is near 0, can use x + x3

6 , or even just x, for an effective approximation

to sinh(x). So, a good library function would compute sinh(x) by the regular formula (using exponentials) for |x| not very small, and by taking a term or two of the Taylor expansion for |x| very small.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 13 / 1

slide-31
SLIDE 31

Roundoff Errors The essentials

Example

Compute y = sinh(x) = 1

2(ex − e−x).

  • Naively computing y at an x near 0 may result in a (meaningless) 0.
  • Instead use Taylor’s expansion

ex = 1 + x + x2 2 + x3 6 + . . . to obtain sinh(x) = x + x3 6 + . . . .

  • If x is near 0, can use x + x3

6 , or even just x, for an effective approximation

to sinh(x). So, a good library function would compute sinh(x) by the regular formula (using exponentials) for |x| not very small, and by taking a term or two of the Taylor expansion for |x| very small.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 13 / 1

slide-32
SLIDE 32

Roundoff Errors The essentials

Limiting roundoff error accumulation

We are supposed to calculate √x + 1 − √x for x ≫ 1. We realize that √x + 1 − √x =

1 √x+1+√x. Which formula should we use for the computation?

  • √x + 1 − √x.
  • 1

√x+1+√x.

  • Neither.
  • It does not matter which one: the error will be the same.

A B C D

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 14 / 1

slide-33
SLIDE 33

Roundoff Errors The essentials

Example: rough appearance of roundoff errors

Run program Example2 2Figure2 2.m

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −8 −6 −4 −2 2 4 6 x 10

−8

error in sampling exp(−t)(sin(2π t)+2) in single precision t roundoff error

Note how the sign of the floating point representation error at nearby arguments t fluctuates as if randomly: as a function of t it is a “non-smooth” error.

Uri Ascher (UBC Computer Science) CS 303 September 2, 2013 15 / 1