FLOATING POINT OPERATIONS Mahdi Nazm Bojnordi Assistant Professor - - PowerPoint PPT Presentation

floating point operations
SMART_READER_LITE
LIVE PREVIEW

FLOATING POINT OPERATIONS Mahdi Nazm Bojnordi Assistant Professor - - PowerPoint PPT Presentation

FLOATING POINT OPERATIONS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 3810: Computer Organization Overview Notes Homework 6 will be posted tonight n Deadline: Mar. 5 th This lecture


slide-1
SLIDE 1

FLOATING POINT OPERATIONS

CS/ECE 3810: Computer Organization

Mahdi Nazm Bojnordi

Assistant Professor School of Computing University of Utah

slide-2
SLIDE 2

Overview

¨ Notes

¤ Homework 6 will be posted tonight

n Deadline: Mar. 5th ¨ This lecture

¤ Floating point operations ¤ Basics of logic design

slide-3
SLIDE 3

Recall: Floating Point Addition

¨ Numbers maintain only 4 decimal digits and 2 exponent digits n 9.999 x 101

+ 1.610 x 10-1

¤ Convert to the larger exponent n 9.999 x 101

+ 0.016 x 101

¤ Add n 10.015 x 101 ¤ Normalize n 1.0015 x 102 ¤ Check for overflow/underflow ¤ Round n 1.002 x 102 ¤ Re-normalize

slide-4
SLIDE 4

Recall: Floating Point Addition

¨ Numbers maintain only 4 decimal digits and 2 exponent digits n 9.999 x 101

+ 1.610 x 10-1

¤ Convert to the larger exponent n 9.999 x 101

+ 0.016 x 101

¤ Add n 10.015 x 101 ¤ Normalize n 1.0015 x 102 ¤ Check for overflow/underflow ¤ Round n 1.002 x 102 ¤ Re-normalize

If we had more fraction bits, these errors would be minimized

slide-5
SLIDE 5

Floating Point Addition

¨ Numbers maintain only 4 binary digits and 2 exponent

digits

n 1.010 x 21

+ 1.100 x 23

¤ Convert to the larger exponent n 0.0101 x 23

+ 1.100 x 23

¤ Add n 1.1101 x 23 ¤ Normalize n 1.1101 x 23 ¤ Check for overflow/underflow

slide-6
SLIDE 6

Floating Point Addition

¨ Numbers maintain only 4 binary digits and 2 exponent

digits

n 1.010 x 21

+ 1.100 x 23

¤ Convert to the larger exponent n 0.0101 x 23

+ 1.100 x 23

¤ Add n 1.1101 x 23 ¤ Normalize n 1.1101 x 23 ¤ Check for overflow/underflow ¤ IEEE 754 format (32-bit)

0 10000010 11010000000000000000000

slide-7
SLIDE 7

Floating Point Addition

¨ Example: add the following two single-precision

floating point numbers.

A: B:

  • 1. Convert to larger exponent
  • 2. Add
  • 3. Normalize
  • 4. Round

Steps:

slide-8
SLIDE 8

Floating Point Addition

¨ Example: add the following two single-precision

floating point numbers.

A: B: EA = 128 EB = 131 MA = 1.11two MB = 1.010011two EA = 131 EB = 131 MA = 0.00111two MB = 1.010011two EA = EB = 131 MA + MB = 0.00111two + 1.010011two = 1.100001two A + B:

slide-9
SLIDE 9

Floating Point Multiplication

¨ Similar steps are required for multiplication ¤ Compute exponent

n Need to remove bias

¤ Multiply significands

n May end up unnormalized

¤ Normalize

n Shift the point

¤ Round

n Fit in the number of bits

¤ Assign sign

n Compute sign

slide-10
SLIDE 10

Floating Point Multiplication

¨ Example: multiply the following two single-precision

floating point numbers.

A: B:

  • 1. Compute exponent
  • 2. Multiply significands
  • 3. Normalize
  • 4. Round
  • 5. Compute sign

Steps:

slide-11
SLIDE 11

Floating Point Multiplication

¨ Example: multiply the following two single-precision

floating point numbers.

A: B: EA = 128 EB = 131 MA = 1.11two MB = 1.010011two EAxB = 128 + 131 – 127 = 132 MAxB = 10.01000101two EAxB = 133 MAxB = 1.001000101two A x B:

slide-12
SLIDE 12

Floating Point Instructions

¨ MIPS employs separate registers for floating point ¤ 32-bit registers: $f0, $f1, …, $f31. ¤ Each register represents a single-precision number ¤ Register pairs are used for double-precision

n Example: $f0 refers to {$f0, $f1}

slide-13
SLIDE 13

Floating Point Instructions

¨ Load/Store instructions by coprocessor 1 (c1) ¤ Still use integer registers for address computation ¨ Comparison instructions ¤ Set an internal bit (cond) to be inspected by branch instructions

slide-14
SLIDE 14

Code Example

¨ Convert a temperature in Fahrenheit to Celsius

¤ Assume that constants are stored in global memory float f2c(float fahr) { return ((5.0/9.0)*(fahr–32.0)); }

slide-15
SLIDE 15

Code Example

¨ Convert a temperature in Fahrenheit to Celsius

¤ Assume that constants are stored in global memory float f2c(float fahr) { return ((5.0/9.0)*(fahr–32.0)); } Memory: $gp

const5 const9

slide-16
SLIDE 16

Code Example

¨ Convert a temperature in Fahrenheit to Celsius

¤ Assume that constants are stored in global memory float f2c(float fahr) { return ((5.0/9.0)*(fahr–32.0)); } Memory: $gp

const5 const9

f2c: mtc1 $a0, $f12 lwc1 $f16, const5($gp) lwc1 $f18, const9($gp) div.s $f16, $f16, $f18 lwc1 $f18, const32($gp) sub.s $f18, $f12, $f18 mul.s $f0, $f16, $f18 jr $ra