Faster multiprecision integer division William Hart June 22, 2015 - - PowerPoint PPT Presentation

faster multiprecision integer division
SMART_READER_LITE
LIVE PREVIEW

Faster multiprecision integer division William Hart June 22, 2015 - - PowerPoint PPT Presentation

Outline Multiprecision integer division Faster multiprecision integer division William Hart June 22, 2015 William Hart Faster multiprecision integer division Outline Multiprecision integer division Multiprecision integer division William


slide-1
SLIDE 1

Outline Multiprecision integer division

Faster multiprecision integer division

William Hart June 22, 2015

William Hart Faster multiprecision integer division

slide-2
SLIDE 2

Outline Multiprecision integer division

Multiprecision integer division

William Hart Faster multiprecision integer division

slide-3
SLIDE 3

Outline Multiprecision integer division

Multiprecision integer division

  • Given:
  • A =< am−1, am−2, am−3, . . . , a1, a0 >
  • D =< dn−1, . . . , d1, d0 >

William Hart Faster multiprecision integer division

slide-4
SLIDE 4

Outline Multiprecision integer division

Multiprecision integer division

  • Given:
  • A =< am−1, am−2, am−3, . . . , a1, a0 >
  • D =< dn−1, . . . , d1, d0 >
  • Compute:
  • Approximate quotient Q⋆

William Hart Faster multiprecision integer division

slide-5
SLIDE 5

Outline Multiprecision integer division

Multiprecision integer division

  • Given:
  • A =< am−1, am−2, am−3, . . . , a1, a0 >
  • D =< dn−1, . . . , d1, d0 >
  • Compute:
  • Approximate quotient Q⋆
  • Exact quotient Q

William Hart Faster multiprecision integer division

slide-6
SLIDE 6

Outline Multiprecision integer division

Multiprecision integer division

  • Given:
  • A =< am−1, am−2, am−3, . . . , a1, a0 >
  • D =< dn−1, . . . , d1, d0 >
  • Compute:
  • Approximate quotient Q⋆
  • Exact quotient Q
  • Quotient Q and remainder R

William Hart Faster multiprecision integer division

slide-7
SLIDE 7

Outline Multiprecision integer division

Machine primitive

  • Machine does 2 × 1 division

William Hart Faster multiprecision integer division

slide-8
SLIDE 8

Outline Multiprecision integer division

Machine primitive

  • Machine does 2 × 1 division
  • < u1, u0 > by < v0 >, quotient and remainder both 1 limb

William Hart Faster multiprecision integer division

slide-9
SLIDE 9

Outline Multiprecision integer division

Machine primitive

  • Machine does 2 × 1 division
  • < u1, u0 > by < v0 >, quotient and remainder both 1 limb
  • Require v0 normalised

William Hart Faster multiprecision integer division

slide-10
SLIDE 10

Outline Multiprecision integer division

Machine primitive

  • Machine does 2 × 1 division
  • < u1, u0 > by < v0 >, quotient and remainder both 1 limb
  • Require v0 normalised
  • If B = 232 or 264 then B < v0 ≤ B/2

William Hart Faster multiprecision integer division

slide-11
SLIDE 11

Outline Multiprecision integer division

Machine primitive

  • Machine does 2 × 1 division
  • < u1, u0 > by < v0 >, quotient and remainder both 1 limb
  • Require v0 normalised
  • If B = 232 or 264 then B < v0 ≤ B/2
  • Also require u1 < v0

William Hart Faster multiprecision integer division

slide-12
SLIDE 12

Outline Multiprecision integer division

Schoolbook algorithm

  • Shift A and D left so that D is normalised

William Hart Faster multiprecision integer division

slide-13
SLIDE 13

Outline Multiprecision integer division

Schoolbook algorithm

  • Shift A and D left so that D is normalised
  • Ensure top n limbs of A are less than D

William Hart Faster multiprecision integer division

slide-14
SLIDE 14

Outline Multiprecision integer division

Schoolbook algorithm

  • Shift A and D left so that D is normalised
  • Ensure top n limbs of A are less than D
  • i ← m − n + 1
  • while i ≥ 0
  • qi ← quotient of < an+i, an+i−1 > by < dn−1 >
  • A ← A − qiDBi
  • i ← i − 1

William Hart Faster multiprecision integer division

slide-15
SLIDE 15

Outline Multiprecision integer division

Schoolbook algorithm

  • Shift A and D left so that D is normalised
  • Ensure top n limbs of A are less than D
  • i ← m − n + 1
  • while i ≥ 0
  • qi ← quotient of < an+i, an+i−1 > by < dn−1 >
  • A ← A − qiDBi
  • i ← i − 1
  • Shift remainder right

William Hart Faster multiprecision integer division

slide-16
SLIDE 16

Outline Multiprecision integer division

Problems

  • May have < an+i, an+i−1 > ≥ dn−1B

William Hart Faster multiprecision integer division

slide-17
SLIDE 17

Outline Multiprecision integer division

Problems

  • May have < an+i, an+i−1 > ≥ dn−1B
  • qiD may be too large

William Hart Faster multiprecision integer division

slide-18
SLIDE 18

Outline Multiprecision integer division

Problems

  • May have < an+i, an+i−1 > ≥ dn−1B
  • qiD may be too large
  • Need tests and adjustments for both cases

William Hart Faster multiprecision integer division

slide-19
SLIDE 19

Outline Multiprecision integer division

Standard improvement No. 1

  • Multiply by precomputed inverse ν of dn−1 instead of

division

William Hart Faster multiprecision integer division

slide-20
SLIDE 20

Outline Multiprecision integer division

Standard improvement No. 1

  • Multiply by precomputed inverse ν of dn−1 instead of

division

  • 1986 Barret used B2n/d

William Hart Faster multiprecision integer division

slide-21
SLIDE 21

Outline Multiprecision integer division

Standard improvement No. 1

  • Multiply by precomputed inverse ν of dn−1 instead of

division

  • 1986 Barret used B2n/d
  • 1986 Beame, Cook, Hoover suggested using an under

approximation of the inverse

William Hart Faster multiprecision integer division

slide-22
SLIDE 22

Outline Multiprecision integer division

Standard improvement No. 1

  • Multiply by precomputed inverse ν of dn−1 instead of

division

  • 1986 Barret used B2n/d
  • 1986 Beame, Cook, Hoover suggested using an under

approximation of the inverse

  • 1951 Wallace, diode transistor circuits

William Hart Faster multiprecision integer division

slide-23
SLIDE 23

Outline Multiprecision integer division

Standard improvement No. 1

  • Multiply by precomputed inverse ν of dn−1 instead of

division

  • 1986 Barret used B2n/d
  • 1986 Beame, Cook, Hoover suggested using an under

approximation of the inverse

  • 1951 Wallace, diode transistor circuits
  • 2000 BC, Babylonian “Clay tablet” PC’s

William Hart Faster multiprecision integer division

slide-24
SLIDE 24

Outline Multiprecision integer division

Precomputed inverses

  • 1994, Granlund, Montgomery suggested

ν = ⌊(B2 − 1)/dn−1⌋ − B

William Hart Faster multiprecision integer division

slide-25
SLIDE 25

Outline Multiprecision integer division

Precomputed inverses

  • 1994, Granlund, Montgomery suggested

ν = ⌊(B2 − 1)/dn−1⌋ − B

  • One mulhigh, one mul

William Hart Faster multiprecision integer division

slide-26
SLIDE 26

Outline Multiprecision integer division

Precomputed inverses

  • 1994, Granlund, Montgomery suggested

ν = ⌊(B2 − 1)/dn−1⌋ − B

  • One mulhigh, one mul
  • At most 3 correction steps for 2 × 1 division

William Hart Faster multiprecision integer division

slide-27
SLIDE 27

Outline Multiprecision integer division

Precomputed inverses

  • 1994, Granlund, Montgomery suggested

ν = ⌊(B2 − 1)/dn−1⌋ − B

  • One mulhigh, one mul
  • At most 3 correction steps for 2 × 1 division

Additional problem:

  • Quotient limb may now be too small or too large!!

William Hart Faster multiprecision integer division

slide-28
SLIDE 28

Outline Multiprecision integer division

  • ller-Granlund (2011)
  • Same precomputed inverse

William Hart Faster multiprecision integer division

slide-29
SLIDE 29

Outline Multiprecision integer division

  • ller-Granlund (2011)
  • Same precomputed inverse
  • Algorithm improved to one mul, one mullow

William Hart Faster multiprecision integer division

slide-30
SLIDE 30

Outline Multiprecision integer division

  • ller-Granlund (2011)
  • Same precomputed inverse
  • Algorithm improved to one mul, one mullow
  • One correction with unpredicted branch

William Hart Faster multiprecision integer division

slide-31
SLIDE 31

Outline Multiprecision integer division

  • ller-Granlund (2011)
  • Same precomputed inverse
  • Algorithm improved to one mul, one mullow
  • One correction with unpredicted branch
  • One very unlikely correction

William Hart Faster multiprecision integer division

slide-32
SLIDE 32

Outline Multiprecision integer division

Improvement No. 2

Problem : multiprecision corrections very expensive

William Hart Faster multiprecision integer division

slide-33
SLIDE 33

Outline Multiprecision integer division

Improvement No. 2

Problem : multiprecision corrections very expensive

  • Idea : use 3 × 2 divisions to reduce probability of

MP-correction

William Hart Faster multiprecision integer division

slide-34
SLIDE 34

Outline Multiprecision integer division

Improvement No. 2

Problem : multiprecision corrections very expensive

  • Idea : use 3 × 2 divisions to reduce probability of

MP-correction

  • ller-Granlund give 3 × 2 version of their algorithm

William Hart Faster multiprecision integer division

slide-35
SLIDE 35

Outline Multiprecision integer division

Our precomputed inverse

  • Use ν = ⌊B2/(dn−1 + 1)⌋ − B

William Hart Faster multiprecision integer division

slide-36
SLIDE 36

Outline Multiprecision integer division

Our precomputed inverse

  • Use ν = ⌊B2/(dn−1 + 1)⌋ − B
  • Because we used dn−1 + 1, quotient is never too large

William Hart Faster multiprecision integer division

slide-37
SLIDE 37

Outline Multiprecision integer division

Our precomputed inverse

  • Use ν = ⌊B2/(dn−1 + 1)⌋ − B
  • Because we used dn−1 + 1, quotient is never too large
  • Approximate quotient can be done with no corrections

William Hart Faster multiprecision integer division

slide-38
SLIDE 38

Outline Multiprecision integer division

Our precomputed inverse

  • Use ν = ⌊B2/(dn−1 + 1)⌋ − B
  • Because we used dn−1 + 1, quotient is never too large
  • Approximate quotient can be done with no corrections
  • 3 × 2 divapprox possible with this precomputed inverse

William Hart Faster multiprecision integer division

slide-39
SLIDE 39

Outline Multiprecision integer division

Our precomputed inverse

  • Use ν = ⌊B2/(dn−1 + 1)⌋ − B
  • Because we used dn−1 + 1, quotient is never too large
  • Approximate quotient can be done with no corrections
  • 3 × 2 divapprox possible with this precomputed inverse
  • Multiprecision corrections rare, always in same direction

William Hart Faster multiprecision integer division

slide-40
SLIDE 40

Outline Multiprecision integer division

Our precomputed inverse

  • Use ν = ⌊B2/(dn−1 + 1)⌋ − B
  • Because we used dn−1 + 1, quotient is never too large
  • Approximate quotient can be done with no corrections
  • 3 × 2 divapprox possible with this precomputed inverse
  • Multiprecision corrections rare, always in same direction
  • Moreover, MP corrections in same direction as 3 × 2

corrections

William Hart Faster multiprecision integer division

slide-41
SLIDE 41

Outline Multiprecision integer division

Timings (MP divappr)

limbs GMP 6.0.0a Our code 3 6.7e − 8s 5.7e − 8s 4 8.8e − 8s 7.5e − 8s 6 1.36e − 7s 1.17e − 7s 7 1.85e − 7s 1.41e − 7s 9 2.27e − 7s 1.89e − 7s 11 2.98e − 7s 2.38e − 7s 15 4.35e − 7s 3.64e − 7s 19 6.07e − 7s 4.89e − 7s 21 7.05e − 7s 5.96e − 7s 27 1.05e − 6s 8.7e − 7s 33 1.45e − 6s 1.2e − 6s 37 1.72e − 6s 1.48e − 6s

William Hart Faster multiprecision integer division

slide-42
SLIDE 42

Outline Multiprecision integer division

Problem

  • 3 × 2 divapprox good for small MP divisions

William Hart Faster multiprecision integer division

slide-43
SLIDE 43

Outline Multiprecision integer division

Problem

  • 3 × 2 divapprox good for small MP divisions
  • ... but MP corrections eventually become significant

William Hart Faster multiprecision integer division

slide-44
SLIDE 44

Outline Multiprecision integer division

Problem

  • 3 × 2 divapprox good for small MP divisions
  • ... but MP corrections eventually become significant
  • Need 3 × 2 division with exact quotient and remainder

William Hart Faster multiprecision integer division

slide-45
SLIDE 45

Outline Multiprecision integer division

Problem

  • 3 × 2 divapprox good for small MP divisions
  • ... but MP corrections eventually become significant
  • Need 3 × 2 division with exact quotient and remainder
  • Idea : divide by dn−1 + 1 first, then adjust

William Hart Faster multiprecision integer division

slide-46
SLIDE 46

Outline Multiprecision integer division

Problem

  • 3 × 2 divapprox good for small MP divisions
  • ... but MP corrections eventually become significant
  • Need 3 × 2 division with exact quotient and remainder
  • Idea : divide by dn−1 + 1 first, then adjust
  • Yields 3 × 2 divrem with same precomputed inverse

William Hart Faster multiprecision integer division

slide-47
SLIDE 47

Outline Multiprecision integer division

Problem

  • 3 × 2 divapprox good for small MP divisions
  • ... but MP corrections eventually become significant
  • Need 3 × 2 division with exact quotient and remainder
  • Idea : divide by dn−1 + 1 first, then adjust
  • Yields 3 × 2 divrem with same precomputed inverse
  • One correction with unpredicted branch, one unlikely

correction

William Hart Faster multiprecision integer division

slide-48
SLIDE 48

Outline Multiprecision integer division

Problem

  • 3 × 2 divapprox good for small MP divisions
  • ... but MP corrections eventually become significant
  • Need 3 × 2 division with exact quotient and remainder
  • Idea : divide by dn−1 + 1 first, then adjust
  • Yields 3 × 2 divrem with same precomputed inverse
  • One correction with unpredicted branch, one unlikely

correction

  • One additional 2 limb subtraction over M¨
  • ller-Granlund

William Hart Faster multiprecision integer division

slide-49
SLIDE 49

Outline Multiprecision integer division

Problem

  • 3 × 2 divapprox good for small MP divisions
  • ... but MP corrections eventually become significant
  • Need 3 × 2 division with exact quotient and remainder
  • Idea : divide by dn−1 + 1 first, then adjust
  • Yields 3 × 2 divrem with same precomputed inverse
  • One correction with unpredicted branch, one unlikely

correction

  • One additional 2 limb subtraction over M¨
  • ller-Granlund
  • No real time difference to M¨
  • ller-Granlund for MP division

William Hart Faster multiprecision integer division

slide-50
SLIDE 50

Outline Multiprecision integer division

Improvement 3: divapprox

  • Given:
  • A =< am−1, am−2, am−3, . . . , a1, a0 >
  • D =< dn−1, . . . , d1, d0 >

William Hart Faster multiprecision integer division

slide-51
SLIDE 51

Outline Multiprecision integer division

Improvement 3: divapprox

  • Given:
  • A =< am−1, am−2, am−3, . . . , a1, a0 >
  • D =< dn−1, . . . , d1, d0 >

Let s ← m − n + 1

  • Compute:
  • Approx. quotient Q⋆ =< qs−1, . . . , q1, q0 >

William Hart Faster multiprecision integer division

slide-52
SLIDE 52

Outline Multiprecision integer division

Improvement 3: divapprox

  • Given:
  • A =< am−1, am−2, am−3, . . . , a1, a0 >
  • D =< dn−1, . . . , d1, d0 >

Let s ← m − n + 1

  • Compute:
  • Approx. quotient Q⋆ =< qs−1, . . . , q1, q0 >
  • A ← A − n+s−2

i+j=n−2 qidjBn−3

William Hart Faster multiprecision integer division

slide-53
SLIDE 53

Outline Multiprecision integer division

Improvement 3: divapprox

  • Given:
  • A =< am−1, am−2, am−3, . . . , a1, a0 >
  • D =< dn−1, . . . , d1, d0 >

Let s ← m − n + 1

  • Compute:
  • Approx. quotient Q⋆ =< qs−1, . . . , q1, q0 >
  • A ← A − n+s−2

i+j=n−2 qidjBn−3

Only affects top s + 2 limbs of A

William Hart Faster multiprecision integer division

slide-54
SLIDE 54

Outline Multiprecision integer division

Exact MP quotient

  • Can compute exact quotient Q from divapprox (same cost)

William Hart Faster multiprecision integer division

slide-55
SLIDE 55

Outline Multiprecision integer division

Exact MP quotient

  • Can compute exact quotient Q from divapprox (same cost)
  • Can compute remainder R from Q using mullo

William Hart Faster multiprecision integer division

slide-56
SLIDE 56

Outline Multiprecision integer division

Exact MP quotient

  • Can compute exact quotient Q from divapprox (same cost)
  • Can compute remainder R from Q using mullo
  • Complexity same as schoolbook divrem
  • Can we compute integer divapprox in subquadratic time

William Hart Faster multiprecision integer division

slide-57
SLIDE 57

Outline Multiprecision integer division

Middle product

2004, Hanrot, Quercia, Zimmermann

William Hart Faster multiprecision integer division

slide-58
SLIDE 58

Outline Multiprecision integer division

Middle product

2004, Hanrot, Quercia, Zimmermann

William Hart Faster multiprecision integer division

slide-59
SLIDE 59

Outline Multiprecision integer division

Middle product

2004, Hanrot, Quercia, Zimmermann

William Hart Faster multiprecision integer division

slide-60
SLIDE 60

Outline Multiprecision integer division

Subquadratic divapprox

  • Q. Can we compute divapprox in subquadratic time?

William Hart Faster multiprecision integer division

slide-61
SLIDE 61

Outline Multiprecision integer division

Subquadratic divapprox

  • Q. Can we compute divapprox in subquadratic time?
  • A. Yes. Relies on subquadratic integer mulmid

William Hart Faster multiprecision integer division

slide-62
SLIDE 62

Outline Multiprecision integer division

Subquadratic divapprox

  • Q. Can we compute divapprox in subquadratic time?
  • A. Yes. Relies on subquadratic integer mulmid
  • (Harvey 2012) Karatsuba =

⇒ mulmid toom42 (Tellegen’s principle)

William Hart Faster multiprecision integer division

slide-63
SLIDE 63

Outline Multiprecision integer division

Subquadratic divapprox

  • Q. Can we compute divapprox in subquadratic time?
  • A. Yes. Relies on subquadratic integer mulmid
  • (Harvey 2012) Karatsuba =

⇒ mulmid toom42 (Tellegen’s principle)

  • A =< am−1, am−2, am−3, . . . , a1, a0 >
  • B =< bn−2, . . . , b1, b0 >

William Hart Faster multiprecision integer division

slide-64
SLIDE 64

Outline Multiprecision integer division

Subquadratic divapprox

  • Q. Can we compute divapprox in subquadratic time?
  • A. Yes. Relies on subquadratic integer mulmid
  • (Harvey 2012) Karatsuba =

⇒ mulmid toom42 (Tellegen’s principle)

  • A =< am−1, am−2, am−3, . . . , a1, a0 >
  • B =< bn−2, . . . , b1, b0 >
  • Mulmidm,n(A, B) =< ov1, ov0, pm−n, . . . , p1, p0 >=

m−1

i+j=n−1 aibjBi+j−n+1

William Hart Faster multiprecision integer division

slide-65
SLIDE 65

Outline Multiprecision integer division

Subquadratic divapprox

  • Q. Can we compute divapprox in subquadratic time?
  • A. Yes. Relies on subquadratic integer mulmid
  • (Harvey 2012) Karatsuba =

⇒ mulmid toom42 (Tellegen’s principle)

  • A =< am−1, am−2, am−3, . . . , a1, a0 >
  • B =< bn−2, . . . , b1, b0 >
  • Mulmidm,n(A, B) =< ov1, ov0, pm−n, . . . , p1, p0 >=

m−1

i+j=n−1 aibjBi+j−n+1

  • Roughly : computes middle third of 2n × n product

William Hart Faster multiprecision integer division

slide-66
SLIDE 66

Outline Multiprecision integer division

Divide-and-conquer divapprox

  • s ← m − n + 1

William Hart Faster multiprecision integer division

slide-67
SLIDE 67

Outline Multiprecision integer division

Divide-and-conquer divapprox

  • s ← m − n + 1
  • sh = ⌊s/2⌋, sl = s − sh

William Hart Faster multiprecision integer division

slide-68
SLIDE 68

Outline Multiprecision integer division

Divide-and-conquer divapprox

  • s ← m − n + 1
  • sh = ⌊s/2⌋, sl = s − sh
  • Recursively compute top sh limbs of Q⋆ using DC

divapprox

William Hart Faster multiprecision integer division

slide-69
SLIDE 69

Outline Multiprecision integer division

Divide-and-conquer divapprox

  • s ← m − n + 1
  • sh = ⌊s/2⌋, sl = s − sh
  • Recursively compute top sh limbs of Q⋆ using DC

divapprox

  • – Affects only top sh + 2 limbs of A

William Hart Faster multiprecision integer division

slide-70
SLIDE 70

Outline Multiprecision integer division

Divide-and-conquer divapprox

  • s ← m − n + 1
  • sh = ⌊s/2⌋, sl = s − sh
  • Recursively compute top sh limbs of Q⋆ using DC

divapprox

  • – Affects only top sh + 2 limbs of A
  • Subtract Mulmid(Q⋆, D) to adjust next sl limbs of A

William Hart Faster multiprecision integer division

slide-71
SLIDE 71

Outline Multiprecision integer division

Divide-and-conquer divapprox

  • s ← m − n + 1
  • sh = ⌊s/2⌋, sl = s − sh
  • Recursively compute top sh limbs of Q⋆ using DC

divapprox

  • – Affects only top sh + 2 limbs of A
  • Subtract Mulmid(Q⋆, D) to adjust next sl limbs of A
  • – In total (sh + 2) + sl = s + 2 limbs of A affected

William Hart Faster multiprecision integer division

slide-72
SLIDE 72

Outline Multiprecision integer division

Divide-and-conquer divapprox

  • s ← m − n + 1
  • sh = ⌊s/2⌋, sl = s − sh
  • Recursively compute top sh limbs of Q⋆ using DC

divapprox

  • – Affects only top sh + 2 limbs of A
  • Subtract Mulmid(Q⋆, D) to adjust next sl limbs of A
  • – In total (sh + 2) + sl = s + 2 limbs of A affected
  • Recursively compute low sl limbs of Q⋆ using DC

divapprox

William Hart Faster multiprecision integer division

slide-73
SLIDE 73

Outline Multiprecision integer division

Problem

  • Algorithm occasionally returns incorrect result!!

William Hart Faster multiprecision integer division

slide-74
SLIDE 74

Outline Multiprecision integer division

Problem

  • Algorithm occasionally returns incorrect result!!
  • Carries can cause each limb of quotient to be wrong

William Hart Faster multiprecision integer division

slide-75
SLIDE 75

Outline Multiprecision integer division

Problem

  • Algorithm occasionally returns incorrect result!!
  • Carries can cause each limb of quotient to be wrong
  • Due to massive cancellation of correction terms, there is a

linear time fixup to yield correct base B arithmetic

William Hart Faster multiprecision integer division

slide-76
SLIDE 76

Outline Multiprecision integer division

Timings (MP divappr)

William Hart Faster multiprecision integer division

slide-77
SLIDE 77

Outline Multiprecision integer division

Thank You

Thank You!

William Hart Faster multiprecision integer division