Strengths and weaknesses of quantum examples Srinivasan Arunachalam - - PowerPoint PPT Presentation

strengths and weaknesses of quantum examples
SMART_READER_LITE
LIVE PREVIEW

Strengths and weaknesses of quantum examples Srinivasan Arunachalam - - PowerPoint PPT Presentation

Strengths and weaknesses of quantum examples Srinivasan Arunachalam (MIT) joint with Ronald de Wolf (CWI, Amsterdam) and others 1/ 18 Machine learning Classical machine learning 2/ 18 Machine learning Classical machine learning Grand goal:


slide-1
SLIDE 1

1/ 18

Strengths and weaknesses

  • f quantum examples

Srinivasan Arunachalam (MIT)

joint with Ronald de Wolf (CWI, Amsterdam) and others

slide-2
SLIDE 2

2/ 18

Machine learning

Classical machine learning

slide-3
SLIDE 3

2/ 18

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

slide-4
SLIDE 4

2/ 18

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves Practical goal: learn“something” from given data

slide-5
SLIDE 5

2/ 18

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves Practical goal: learn“something” from given data Recent success: deep learning is extremely good at image recognition, natural language processing, even the game of Go

slide-6
SLIDE 6

2/ 18

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves Practical goal: learn“something” from given data Recent success: deep learning is extremely good at image recognition, natural language processing, even the game of Go Why the recent interest? Flood of available data, increasing computational power, growing progress in algorithms

slide-7
SLIDE 7

2/ 18

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves Practical goal: learn“something” from given data Recent success: deep learning is extremely good at image recognition, natural language processing, even the game of Go Why the recent interest? Flood of available data, increasing computational power, growing progress in algorithms

Quantum machine learning What can quantum computing do for machine learning?

slide-8
SLIDE 8

2/ 18

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves Practical goal: learn“something” from given data Recent success: deep learning is extremely good at image recognition, natural language processing, even the game of Go Why the recent interest? Flood of available data, increasing computational power, growing progress in algorithms

Quantum machine learning What can quantum computing do for machine learning? The learner will be quantum, the data may be quantum

slide-9
SLIDE 9

2/ 18

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves Practical goal: learn“something” from given data Recent success: deep learning is extremely good at image recognition, natural language processing, even the game of Go Why the recent interest? Flood of available data, increasing computational power, growing progress in algorithms

Quantum machine learning What can quantum computing do for machine learning? The learner will be quantum, the data may be quantum Some examples are known of reduction in time complexity:

clustering (A¨ ımeur et al. ’13) Principal component analysis (Lloyd et al. ’13) perceptron learning (Wiebe et al. ’16) recommendation systems (Kerenidis & Prakash ’16)

slide-10
SLIDE 10

3/ 18

Learning using classical examples

slide-11
SLIDE 11

3/ 18

Learning using classical examples

Basic definitions Concept class C: collection of Boolean functions on n bits (Known)

slide-12
SLIDE 12

3/ 18

Learning using classical examples

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C (Unknown)

slide-13
SLIDE 13

3/ 18

Learning using classical examples

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C (Unknown) Distribution D : {0, 1}n → [0, 1]

slide-14
SLIDE 14

3/ 18

Learning using classical examples

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C (Unknown) Distribution D : {0, 1}n → [0, 1] Labeled example for c ∈ C: (x, c(x)) where x ∼ D

slide-15
SLIDE 15

4/ 18

Classical learner using classical examples

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C. (Unknown) Distribution D : {0, 1}n → [0, 1] Labeled example for c ∈ C: (x, c(x)) where x ∼ D Learner is trying to learn c

slide-16
SLIDE 16

4/ 18

Classical learner using classical examples

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C. (Unknown) Distribution D : {0, 1}n → [0, 1] Labeled example for c ∈ C: (x, c(x)) where x ∼ D Learner is trying to learn c

slide-17
SLIDE 17

4/ 18

Classical learner using classical examples

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C. (Unknown) Distribution D : {0, 1}n → [0, 1] Labeled example for c ∈ C: (x, c(x)) where x ∼ D Learner is trying to learn c

slide-18
SLIDE 18

4/ 18

Classical learner using classical examples

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C. (Unknown) Distribution D : {0, 1}n → [0, 1] Labeled example for c ∈ C: (x, c(x)) where x ∼ D Learner is trying to learn c

slide-19
SLIDE 19

5/ 18

Quantum learning using quantum examples

slide-20
SLIDE 20

5/ 18

Quantum learning using quantum examples

Learner is quantum:

slide-21
SLIDE 21

5/ 18

Quantum learning using quantum examples

Learner is quantum: Data is quantum: Bshouty-Jackson’95 introduced a quantum example as a superposition

  • x∈{0,1}n
  • D(x) |x, c(x)
slide-22
SLIDE 22

5/ 18

Quantum learning using quantum examples

Learner is quantum: Data is quantum: Bshouty-Jackson’95 introduced a quantum example as a superposition

  • x∈{0,1}n
  • D(x) |x, c(x)

Measuring this state gives a (x, c(x)) with probability D(x),

slide-23
SLIDE 23

5/ 18

Quantum learning using quantum examples

Learner is quantum: Data is quantum: Bshouty-Jackson’95 introduced a quantum example as a superposition

  • x∈{0,1}n
  • D(x) |x, c(x)

Measuring this state gives a (x, c(x)) with probability D(x), so quantum examples are at least as powerful as classical

slide-24
SLIDE 24

6/ 18

Motivating question for this talk

Fix a concept class C, distribution D : {0, 1}n → [0, 1]

slide-25
SLIDE 25

6/ 18

Motivating question for this talk

Fix a concept class C, distribution D : {0, 1}n → [0, 1]

slide-26
SLIDE 26

6/ 18

Motivating question for this talk

Fix a concept class C, distribution D : {0, 1}n → [0, 1] Question Understanding the concept classes C and distributions D where fewer quantum examples suffice for a quantum learner

slide-27
SLIDE 27

7/ 18

Distribution (in)dependent PAC learning

Focus on Probably Approximately Correct (PAC) model of learning

slide-28
SLIDE 28

7/ 18

Distribution (in)dependent PAC learning

Focus on Probably Approximately Correct (PAC) model of learning Fix C ⊆ {c : {0, 1}n → {0, 1}} and D : {0, 1}n → [0, 1]

slide-29
SLIDE 29

7/ 18

Distribution (in)dependent PAC learning

Focus on Probably Approximately Correct (PAC) model of learning Fix C ⊆ {c : {0, 1}n → {0, 1}} and D : {0, 1}n → [0, 1] Using i.i.d. labeled examples, learner for C should output hypothesis h that is close to c w.r.t. D,

slide-30
SLIDE 30

7/ 18

Distribution (in)dependent PAC learning

Focus on Probably Approximately Correct (PAC) model of learning Fix C ⊆ {c : {0, 1}n → {0, 1}} and D : {0, 1}n → [0, 1] Using i.i.d. labeled examples, learner for C should output hypothesis h that is close to c w.r.t. D, i.e., errD(c, h) = Prx∼D[c(x) = h(x)] should be small

slide-31
SLIDE 31

7/ 18

Distribution (in)dependent PAC learning

Focus on Probably Approximately Correct (PAC) model of learning Fix C ⊆ {c : {0, 1}n → {0, 1}} and D : {0, 1}n → [0, 1] Using i.i.d. labeled examples, learner for C should output hypothesis h that is close to c w.r.t. D, i.e., errD(c, h) = Prx∼D[c(x) = h(x)] should be small Distribution-dependent learning (for a fixed D) An algorithm (ε, δ)-learns C under D if: ∀c ∈ C : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

slide-32
SLIDE 32

7/ 18

Distribution (in)dependent PAC learning

Focus on Probably Approximately Correct (PAC) model of learning Fix C ⊆ {c : {0, 1}n → {0, 1}} and D : {0, 1}n → [0, 1] Using i.i.d. labeled examples, learner for C should output hypothesis h that is close to c w.r.t. D, i.e., errD(c, h) = Prx∼D[c(x) = h(x)] should be small Distribution-dependent learning (for a fixed D) An algorithm (ε, δ)-learns C under D if: ∀c ∈ C : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

PAC learning (Distribution-independent learning for every D)

slide-33
SLIDE 33

7/ 18

Distribution (in)dependent PAC learning

Focus on Probably Approximately Correct (PAC) model of learning Fix C ⊆ {c : {0, 1}n → {0, 1}} and D : {0, 1}n → [0, 1] Using i.i.d. labeled examples, learner for C should output hypothesis h that is close to c w.r.t. D, i.e., errD(c, h) = Prx∼D[c(x) = h(x)] should be small Distribution-dependent learning (for a fixed D) An algorithm (ε, δ)-learns C under D if: ∀c ∈ C : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

PAC learning (Distribution-independent learning for every D) An algorithm (ε, δ)-PAC-learns C if: ∀D ∀c ∈ C : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

slide-34
SLIDE 34

8/ 18

Complexity of learning

How to measure the efficiency of the classical or quantum learner?

slide-35
SLIDE 35

8/ 18

Complexity of learning

How to measure the efficiency of the classical or quantum learner? Sample complexity: number of labeled examples used by learner

slide-36
SLIDE 36

8/ 18

Complexity of learning

How to measure the efficiency of the classical or quantum learner? Sample complexity: number of labeled examples used by learner Time complexity: number of time-steps used by learner

slide-37
SLIDE 37

8/ 18

Complexity of learning

How to measure the efficiency of the classical or quantum learner? Sample complexity: number of labeled examples used by learner Time complexity: number of time-steps used by learner In this talk Strengths of quantum examples

slide-38
SLIDE 38

8/ 18

Complexity of learning

How to measure the efficiency of the classical or quantum learner? Sample complexity: number of labeled examples used by learner Time complexity: number of time-steps used by learner In this talk Strengths of quantum examples ACLW’18: Sample complexity of learning Fourier-sparse Boolean functions under uniform D

slide-39
SLIDE 39

8/ 18

Complexity of learning

How to measure the efficiency of the classical or quantum learner? Sample complexity: number of labeled examples used by learner Time complexity: number of time-steps used by learner In this talk Strengths of quantum examples ACLW’18: Sample complexity of learning Fourier-sparse Boolean functions under uniform D Bshouty-Jackson’95: Quantum polynomial time learnability of DNFs under uniform D

slide-40
SLIDE 40

8/ 18

Complexity of learning

How to measure the efficiency of the classical or quantum learner? Sample complexity: number of labeled examples used by learner Time complexity: number of time-steps used by learner In this talk Strengths of quantum examples ACLW’18: Sample complexity of learning Fourier-sparse Boolean functions under uniform D Bshouty-Jackson’95: Quantum polynomial time learnability of DNFs under uniform D ACKW’18: Quantum examples can help the coupon collector

slide-41
SLIDE 41

8/ 18

Complexity of learning

How to measure the efficiency of the classical or quantum learner? Sample complexity: number of labeled examples used by learner Time complexity: number of time-steps used by learner In this talk Strengths of quantum examples ACLW’18: Sample complexity of learning Fourier-sparse Boolean functions under uniform D Bshouty-Jackson’95: Quantum polynomial time learnability of DNFs under uniform D ACKW’18: Quantum examples can help the coupon collector Weaknesses of quantum examples AW’17: Quantum examples are not more powerful than classical examples for PAC learning

slide-42
SLIDE 42

9/ 18

Fourier sampling: a useful trick under uniform D

slide-43
SLIDE 43

9/ 18

Fourier sampling: a useful trick under uniform D

Let c : {0, 1}n → {−1, 1}.

slide-44
SLIDE 44

9/ 18

Fourier sampling: a useful trick under uniform D

Let c : {0, 1}n → {−1, 1}. Then the Fourier coefficients are

  • c(S) = 1

2n

  • x∈{0,1}n

c(x)(−1)S·x for all S ∈ {0, 1}n

slide-45
SLIDE 45

9/ 18

Fourier sampling: a useful trick under uniform D

Let c : {0, 1}n → {−1, 1}. Then the Fourier coefficients are

  • c(S) = 1

2n

  • x∈{0,1}n

c(x)(−1)S·x for all S ∈ {0, 1}n Parseval’s identity:

S

c(S)2 =

slide-46
SLIDE 46

9/ 18

Fourier sampling: a useful trick under uniform D

Let c : {0, 1}n → {−1, 1}. Then the Fourier coefficients are

  • c(S) = 1

2n

  • x∈{0,1}n

c(x)(−1)S·x for all S ∈ {0, 1}n Parseval’s identity:

S

c(S)2 = Ex[c(x)2]

slide-47
SLIDE 47

9/ 18

Fourier sampling: a useful trick under uniform D

Let c : {0, 1}n → {−1, 1}. Then the Fourier coefficients are

  • c(S) = 1

2n

  • x∈{0,1}n

c(x)(−1)S·x for all S ∈ {0, 1}n Parseval’s identity:

S

c(S)2 = Ex[c(x)2] = 1

slide-48
SLIDE 48

9/ 18

Fourier sampling: a useful trick under uniform D

Let c : {0, 1}n → {−1, 1}. Then the Fourier coefficients are

  • c(S) = 1

2n

  • x∈{0,1}n

c(x)(−1)S·x for all S ∈ {0, 1}n Parseval’s identity:

S

c(S)2 = Ex[c(x)2] = 1 So { c(S)2}S forms a probability distribution

slide-49
SLIDE 49

9/ 18

Fourier sampling: a useful trick under uniform D

Let c : {0, 1}n → {−1, 1}. Then the Fourier coefficients are

  • c(S) = 1

2n

  • x∈{0,1}n

c(x)(−1)S·x for all S ∈ {0, 1}n Parseval’s identity:

S

c(S)2 = Ex[c(x)2] = 1 So { c(S)2}S forms a probability distribution Given quantum example under uniform D: 1 √ 2n

  • x

|x, c(x)

slide-50
SLIDE 50

9/ 18

Fourier sampling: a useful trick under uniform D

Let c : {0, 1}n → {−1, 1}. Then the Fourier coefficients are

  • c(S) = 1

2n

  • x∈{0,1}n

c(x)(−1)S·x for all S ∈ {0, 1}n Parseval’s identity:

S

c(S)2 = Ex[c(x)2] = 1 So { c(S)2}S forms a probability distribution Given quantum example under uniform D: 1 √ 2n

  • x

|x, c(x)

Hadamard

− →

  • S
  • c(S) |S
slide-51
SLIDE 51

9/ 18

Fourier sampling: a useful trick under uniform D

Let c : {0, 1}n → {−1, 1}. Then the Fourier coefficients are

  • c(S) = 1

2n

  • x∈{0,1}n

c(x)(−1)S·x for all S ∈ {0, 1}n Parseval’s identity:

S

c(S)2 = Ex[c(x)2] = 1 So { c(S)2}S forms a probability distribution Given quantum example under uniform D: 1 √ 2n

  • x

|x, c(x)

Hadamard

− →

  • S
  • c(S) |S

Measuring allows to sample from the Fourier distribution { c(S)2}S

slide-52
SLIDE 52

10/ 18

Applications of Fourier sampling

slide-53
SLIDE 53

10/ 18

Applications of Fourier sampling

Consider the concept class of linear functions C1 = {cS(x) = S · x}S∈{0,1}n

slide-54
SLIDE 54

10/ 18

Applications of Fourier sampling

Consider the concept class of linear functions C1 = {cS(x) = S · x}S∈{0,1}n Classical: Ω(n) classical examples needed

slide-55
SLIDE 55

10/ 18

Applications of Fourier sampling

Consider the concept class of linear functions C1 = {cS(x) = S · x}S∈{0,1}n Classical: Ω(n) classical examples needed Quantum: 1 quantum example suffices to learn C1 (Bernstein-Vazirani’93)

slide-56
SLIDE 56

10/ 18

Applications of Fourier sampling

Consider the concept class of linear functions C1 = {cS(x) = S · x}S∈{0,1}n Classical: Ω(n) classical examples needed Quantum: 1 quantum example suffices to learn C1 (Bernstein-Vazirani’93) Consider C2 = {c is a ℓ-junta}, i.e., c(x) depends only on ℓ bits of x

slide-57
SLIDE 57

10/ 18

Applications of Fourier sampling

Consider the concept class of linear functions C1 = {cS(x) = S · x}S∈{0,1}n Classical: Ω(n) classical examples needed Quantum: 1 quantum example suffices to learn C1 (Bernstein-Vazirani’93) Consider C2 = {c is a ℓ-junta}, i.e., c(x) depends only on ℓ bits of x Classical: Efficient learning is notoriously hard for ℓ = O(log n) and uniform D

slide-58
SLIDE 58

10/ 18

Applications of Fourier sampling

Consider the concept class of linear functions C1 = {cS(x) = S · x}S∈{0,1}n Classical: Ω(n) classical examples needed Quantum: 1 quantum example suffices to learn C1 (Bernstein-Vazirani’93) Consider C2 = {c is a ℓ-junta}, i.e., c(x) depends only on ℓ bits of x Classical: Efficient learning is notoriously hard for ℓ = O(log n) and uniform D Quantum: C2 can be exactly learnt using O(2ℓ) quantum examples and in time

  • O(n2ℓ + 22ℓ) (Atıcı-Servedio’09)
slide-59
SLIDE 59

10/ 18

Applications of Fourier sampling

Consider the concept class of linear functions C1 = {cS(x) = S · x}S∈{0,1}n Classical: Ω(n) classical examples needed Quantum: 1 quantum example suffices to learn C1 (Bernstein-Vazirani’93) Consider C2 = {c is a ℓ-junta}, i.e., c(x) depends only on ℓ bits of x Classical: Efficient learning is notoriously hard for ℓ = O(log n) and uniform D Quantum: C2 can be exactly learnt using O(2ℓ) quantum examples and in time

  • O(n2ℓ + 22ℓ) (Atıcı-Servedio’09)

Generalizing both these concept classes? Definition: We say c is k-Fourier sparse if |{S : c(S) = 0}| ≤ k.

slide-60
SLIDE 60

10/ 18

Applications of Fourier sampling

Consider the concept class of linear functions C1 = {cS(x) = S · x}S∈{0,1}n Classical: Ω(n) classical examples needed Quantum: 1 quantum example suffices to learn C1 (Bernstein-Vazirani’93) Consider C2 = {c is a ℓ-junta}, i.e., c(x) depends only on ℓ bits of x Classical: Efficient learning is notoriously hard for ℓ = O(log n) and uniform D Quantum: C2 can be exactly learnt using O(2ℓ) quantum examples and in time

  • O(n2ℓ + 22ℓ) (Atıcı-Servedio’09)

Generalizing both these concept classes? Definition: We say c is k-Fourier sparse if |{S : c(S) = 0}| ≤ k. Note that C1 is 1-Fourier sparse and C2 is 2ℓ-Fourier sparse

slide-61
SLIDE 61

10/ 18

Applications of Fourier sampling

Consider the concept class of linear functions C1 = {cS(x) = S · x}S∈{0,1}n Classical: Ω(n) classical examples needed Quantum: 1 quantum example suffices to learn C1 (Bernstein-Vazirani’93) Consider C2 = {c is a ℓ-junta}, i.e., c(x) depends only on ℓ bits of x Classical: Efficient learning is notoriously hard for ℓ = O(log n) and uniform D Quantum: C2 can be exactly learnt using O(2ℓ) quantum examples and in time

  • O(n2ℓ + 22ℓ) (Atıcı-Servedio’09)

Generalizing both these concept classes? Definition: We say c is k-Fourier sparse if |{S : c(S) = 0}| ≤ k. Note that C1 is 1-Fourier sparse and C2 is 2ℓ-Fourier sparse Consider the concept class C = {c : {0, 1}n → {−1, 1} : c is k-Fourier sparse}

slide-62
SLIDE 62

10/ 18

Applications of Fourier sampling

Consider the concept class of linear functions C1 = {cS(x) = S · x}S∈{0,1}n Classical: Ω(n) classical examples needed Quantum: 1 quantum example suffices to learn C1 (Bernstein-Vazirani’93) Consider C2 = {c is a ℓ-junta}, i.e., c(x) depends only on ℓ bits of x Classical: Efficient learning is notoriously hard for ℓ = O(log n) and uniform D Quantum: C2 can be exactly learnt using O(2ℓ) quantum examples and in time

  • O(n2ℓ + 22ℓ) (Atıcı-Servedio’09)

Generalizing both these concept classes? Definition: We say c is k-Fourier sparse if |{S : c(S) = 0}| ≤ k. Note that C1 is 1-Fourier sparse and C2 is 2ℓ-Fourier sparse Consider the concept class C = {c : {0, 1}n → {−1, 1} : c is k-Fourier sparse} Observe that C1 ⊆ C. C contains linear functions

slide-63
SLIDE 63

10/ 18

Applications of Fourier sampling

Consider the concept class of linear functions C1 = {cS(x) = S · x}S∈{0,1}n Classical: Ω(n) classical examples needed Quantum: 1 quantum example suffices to learn C1 (Bernstein-Vazirani’93) Consider C2 = {c is a ℓ-junta}, i.e., c(x) depends only on ℓ bits of x Classical: Efficient learning is notoriously hard for ℓ = O(log n) and uniform D Quantum: C2 can be exactly learnt using O(2ℓ) quantum examples and in time

  • O(n2ℓ + 22ℓ) (Atıcı-Servedio’09)

Generalizing both these concept classes? Definition: We say c is k-Fourier sparse if |{S : c(S) = 0}| ≤ k. Note that C1 is 1-Fourier sparse and C2 is 2ℓ-Fourier sparse Consider the concept class C = {c : {0, 1}n → {−1, 1} : c is k-Fourier sparse} Observe that C1 ⊆ C. C contains linear functions Observe that C2 ⊆ C. C contains (log k)-juntas

slide-64
SLIDE 64

11/ 18

Learning C = {c is k-Fourier sparse}

slide-65
SLIDE 65

11/ 18

Learning C = {c is k-Fourier sparse}

Exact learning C under the uniform distribution D

slide-66
SLIDE 66

11/ 18

Learning C = {c is k-Fourier sparse}

Exact learning C under the uniform distribution D Classically (Haviv-Regev’15): Θ(nk) classical examples (x, c(x)) are necessary and sufficient to learn the concept class C

slide-67
SLIDE 67

11/ 18

Learning C = {c is k-Fourier sparse}

Exact learning C under the uniform distribution D Classically (Haviv-Regev’15): Θ(nk) classical examples (x, c(x)) are necessary and sufficient to learn the concept class C Quantumly (ACLW’18): O(k1.5) quantum examples

1 √ 2n

  • x |x, c(x) are

sufficient to learn C (independent of the universe size n)

slide-68
SLIDE 68

11/ 18

Learning C = {c is k-Fourier sparse}

Exact learning C under the uniform distribution D Classically (Haviv-Regev’15): Θ(nk) classical examples (x, c(x)) are necessary and sufficient to learn the concept class C Quantumly (ACLW’18): O(k1.5) quantum examples

1 √ 2n

  • x |x, c(x) are

sufficient to learn C (independent of the universe size n)

  • Ω(k) examples are necessary to learn C
slide-69
SLIDE 69

11/ 18

Learning C = {c is k-Fourier sparse}

Exact learning C under the uniform distribution D Classically (Haviv-Regev’15): Θ(nk) classical examples (x, c(x)) are necessary and sufficient to learn the concept class C Quantumly (ACLW’18): O(k1.5) quantum examples

1 √ 2n

  • x |x, c(x) are

sufficient to learn C (independent of the universe size n)

  • Ω(k) examples are necessary to learn C

Sketch of upper bound

slide-70
SLIDE 70

11/ 18

Learning C = {c is k-Fourier sparse}

Exact learning C under the uniform distribution D Classically (Haviv-Regev’15): Θ(nk) classical examples (x, c(x)) are necessary and sufficient to learn the concept class C Quantumly (ACLW’18): O(k1.5) quantum examples

1 √ 2n

  • x |x, c(x) are

sufficient to learn C (independent of the universe size n)

  • Ω(k) examples are necessary to learn C

Sketch of upper bound Use Fourier sampling to sample S ∼ { c(S)2}S

slide-71
SLIDE 71

11/ 18

Learning C = {c is k-Fourier sparse}

Exact learning C under the uniform distribution D Classically (Haviv-Regev’15): Θ(nk) classical examples (x, c(x)) are necessary and sufficient to learn the concept class C Quantumly (ACLW’18): O(k1.5) quantum examples

1 √ 2n

  • x |x, c(x) are

sufficient to learn C (independent of the universe size n)

  • Ω(k) examples are necessary to learn C

Sketch of upper bound Use Fourier sampling to sample S ∼ { c(S)2}S Collect Ss until the learner learns the Fourier span of c, V = span{S : c(S) = 0}

slide-72
SLIDE 72

11/ 18

Learning C = {c is k-Fourier sparse}

Exact learning C under the uniform distribution D Classically (Haviv-Regev’15): Θ(nk) classical examples (x, c(x)) are necessary and sufficient to learn the concept class C Quantumly (ACLW’18): O(k1.5) quantum examples

1 √ 2n

  • x |x, c(x) are

sufficient to learn C (independent of the universe size n)

  • Ω(k) examples are necessary to learn C

Sketch of upper bound Use Fourier sampling to sample S ∼ { c(S)2}S Collect Ss until the learner learns the Fourier span of c, V = span{S : c(S) = 0} Suppose dim(V) = r, then O(rk) quantum examples suffice to find V

slide-73
SLIDE 73

11/ 18

Learning C = {c is k-Fourier sparse}

Exact learning C under the uniform distribution D Classically (Haviv-Regev’15): Θ(nk) classical examples (x, c(x)) are necessary and sufficient to learn the concept class C Quantumly (ACLW’18): O(k1.5) quantum examples

1 √ 2n

  • x |x, c(x) are

sufficient to learn C (independent of the universe size n)

  • Ω(k) examples are necessary to learn C

Sketch of upper bound Use Fourier sampling to sample S ∼ { c(S)2}S Collect Ss until the learner learns the Fourier span of c, V = span{S : c(S) = 0} Suppose dim(V) = r, then O(rk) quantum examples suffice to find V Use the result of [HR’15] to learn c′ completely using O(rk) classical examples

slide-74
SLIDE 74

11/ 18

Learning C = {c is k-Fourier sparse}

Exact learning C under the uniform distribution D Classically (Haviv-Regev’15): Θ(nk) classical examples (x, c(x)) are necessary and sufficient to learn the concept class C Quantumly (ACLW’18): O(k1.5) quantum examples

1 √ 2n

  • x |x, c(x) are

sufficient to learn C (independent of the universe size n)

  • Ω(k) examples are necessary to learn C

Sketch of upper bound Use Fourier sampling to sample S ∼ { c(S)2}S Collect Ss until the learner learns the Fourier span of c, V = span{S : c(S) = 0} Suppose dim(V) = r, then O(rk) quantum examples suffice to find V Use the result of [HR’15] to learn c′ completely using O(rk) classical examples Since r ≤ O( √ k) for every c ∈ C [Sanyal’15], we get O(k1.5) upper bound

slide-75
SLIDE 75

12/ 18

Learning Disjunctive normal Forms (DNF)

slide-76
SLIDE 76

12/ 18

Learning Disjunctive normal Forms (DNF)

DNFs Simply an OR of AND of variables.

slide-77
SLIDE 77

12/ 18

Learning Disjunctive normal Forms (DNF)

DNFs Simply an OR of AND of variables. For example, (x1 ∧ x4 ∧ x3) ∨ (x4 ∧ x6 ∧ x7 ∧ x8)

slide-78
SLIDE 78

12/ 18

Learning Disjunctive normal Forms (DNF)

DNFs Simply an OR of AND of variables. For example, (x1 ∧ x4 ∧ x3) ∨ (x4 ∧ x6 ∧ x7 ∧ x8) We say a DNF on n variables is an s-term DNF if number of clauses is ≤ s

slide-79
SLIDE 79

12/ 18

Learning Disjunctive normal Forms (DNF)

DNFs Simply an OR of AND of variables. For example, (x1 ∧ x4 ∧ x3) ∨ (x4 ∧ x6 ∧ x7 ∧ x8) We say a DNF on n variables is an s-term DNF if number of clauses is ≤ s Learning C = {c is an s-term DNF in n variables} under uniform D

slide-80
SLIDE 80

12/ 18

Learning Disjunctive normal Forms (DNF)

DNFs Simply an OR of AND of variables. For example, (x1 ∧ x4 ∧ x3) ∨ (x4 ∧ x6 ∧ x7 ∧ x8) We say a DNF on n variables is an s-term DNF if number of clauses is ≤ s Learning C = {c is an s-term DNF in n variables} under uniform D Classically: Efficient learning using examples is a longstanding open question. Best known upper bound is nO(log n) [Verbeurgt’90]

slide-81
SLIDE 81

12/ 18

Learning Disjunctive normal Forms (DNF)

DNFs Simply an OR of AND of variables. For example, (x1 ∧ x4 ∧ x3) ∨ (x4 ∧ x6 ∧ x7 ∧ x8) We say a DNF on n variables is an s-term DNF if number of clauses is ≤ s Learning C = {c is an s-term DNF in n variables} under uniform D Classically: Efficient learning using examples is a longstanding open question. Best known upper bound is nO(log n) [Verbeurgt’90] Quantumly: Bshouty-Jackson’95 gave a polynomial-time quantum algorithm!

slide-82
SLIDE 82

12/ 18

Learning Disjunctive normal Forms (DNF)

DNFs Simply an OR of AND of variables. For example, (x1 ∧ x4 ∧ x3) ∨ (x4 ∧ x6 ∧ x7 ∧ x8) We say a DNF on n variables is an s-term DNF if number of clauses is ≤ s Learning C = {c is an s-term DNF in n variables} under uniform D Classically: Efficient learning using examples is a longstanding open question. Best known upper bound is nO(log n) [Verbeurgt’90] Quantumly: Bshouty-Jackson’95 gave a polynomial-time quantum algorithm! Proof sketch of quantum upper bound

slide-83
SLIDE 83

12/ 18

Learning Disjunctive normal Forms (DNF)

DNFs Simply an OR of AND of variables. For example, (x1 ∧ x4 ∧ x3) ∨ (x4 ∧ x6 ∧ x7 ∧ x8) We say a DNF on n variables is an s-term DNF if number of clauses is ≤ s Learning C = {c is an s-term DNF in n variables} under uniform D Classically: Efficient learning using examples is a longstanding open question. Best known upper bound is nO(log n) [Verbeurgt’90] Quantumly: Bshouty-Jackson’95 gave a polynomial-time quantum algorithm! Proof sketch of quantum upper bound Structural property: if c is an s-term DNF, then there exists U s.t. | c(U)| ≥ 1

s

slide-84
SLIDE 84

12/ 18

Learning Disjunctive normal Forms (DNF)

DNFs Simply an OR of AND of variables. For example, (x1 ∧ x4 ∧ x3) ∨ (x4 ∧ x6 ∧ x7 ∧ x8) We say a DNF on n variables is an s-term DNF if number of clauses is ≤ s Learning C = {c is an s-term DNF in n variables} under uniform D Classically: Efficient learning using examples is a longstanding open question. Best known upper bound is nO(log n) [Verbeurgt’90] Quantumly: Bshouty-Jackson’95 gave a polynomial-time quantum algorithm! Proof sketch of quantum upper bound Structural property: if c is an s-term DNF, then there exists U s.t. | c(U)| ≥ 1

s

Fourier sampling! Sample T ∼ { c(T)2}T , poly(s) many times to see such a U

slide-85
SLIDE 85

12/ 18

Learning Disjunctive normal Forms (DNF)

DNFs Simply an OR of AND of variables. For example, (x1 ∧ x4 ∧ x3) ∨ (x4 ∧ x6 ∧ x7 ∧ x8) We say a DNF on n variables is an s-term DNF if number of clauses is ≤ s Learning C = {c is an s-term DNF in n variables} under uniform D Classically: Efficient learning using examples is a longstanding open question. Best known upper bound is nO(log n) [Verbeurgt’90] Quantumly: Bshouty-Jackson’95 gave a polynomial-time quantum algorithm! Proof sketch of quantum upper bound Structural property: if c is an s-term DNF, then there exists U s.t. | c(U)| ≥ 1

s

Fourier sampling! Sample T ∼ { c(T)2}T , poly(s) many times to see such a U Construct a “weak learner” who outputs χU s.t. Pr[χU(x) = c(x)] = 1

2 + 1 s

slide-86
SLIDE 86

12/ 18

Learning Disjunctive normal Forms (DNF)

DNFs Simply an OR of AND of variables. For example, (x1 ∧ x4 ∧ x3) ∨ (x4 ∧ x6 ∧ x7 ∧ x8) We say a DNF on n variables is an s-term DNF if number of clauses is ≤ s Learning C = {c is an s-term DNF in n variables} under uniform D Classically: Efficient learning using examples is a longstanding open question. Best known upper bound is nO(log n) [Verbeurgt’90] Quantumly: Bshouty-Jackson’95 gave a polynomial-time quantum algorithm! Proof sketch of quantum upper bound Structural property: if c is an s-term DNF, then there exists U s.t. | c(U)| ≥ 1

s

Fourier sampling! Sample T ∼ { c(T)2}T , poly(s) many times to see such a U Construct a “weak learner” who outputs χU s.t. Pr[χU(x) = c(x)] = 1

2 + 1 s

Not good enough! Want an hypothesis that agrees with c on most inputs x’s

slide-87
SLIDE 87

12/ 18

Learning Disjunctive normal Forms (DNF)

DNFs Simply an OR of AND of variables. For example, (x1 ∧ x4 ∧ x3) ∨ (x4 ∧ x6 ∧ x7 ∧ x8) We say a DNF on n variables is an s-term DNF if number of clauses is ≤ s Learning C = {c is an s-term DNF in n variables} under uniform D Classically: Efficient learning using examples is a longstanding open question. Best known upper bound is nO(log n) [Verbeurgt’90] Quantumly: Bshouty-Jackson’95 gave a polynomial-time quantum algorithm! Proof sketch of quantum upper bound Structural property: if c is an s-term DNF, then there exists U s.t. | c(U)| ≥ 1

s

Fourier sampling! Sample T ∼ { c(T)2}T , poly(s) many times to see such a U Construct a “weak learner” who outputs χU s.t. Pr[χU(x) = c(x)] = 1

2 + 1 s

Not good enough! Want an hypothesis that agrees with c on most inputs x’s Boosting: Run weak learner many times in some manner to obtain a strong learner who outputs h satisfying Pr[h(x) = c(x)] ≥ 2/3

slide-88
SLIDE 88

13/ 18

Pretty good measurement for state identification

slide-89
SLIDE 89

13/ 18

Pretty good measurement for state identification

Consider a concept class C consisting of n-bit Boolean functions. Let D : {0, 1}n → [0, 1] be a distribution

slide-90
SLIDE 90

13/ 18

Pretty good measurement for state identification

Consider a concept class C consisting of n-bit Boolean functions. Let D : {0, 1}n → [0, 1] be a distribution For c ∈ C, a quantum example is |ψc =

x∈{0,1}n

  • D(x) |x, c(x)
slide-91
SLIDE 91

13/ 18

Pretty good measurement for state identification

Consider a concept class C consisting of n-bit Boolean functions. Let D : {0, 1}n → [0, 1] be a distribution For c ∈ C, a quantum example is |ψc =

x∈{0,1}n

  • D(x) |x, c(x)

State identification: For uniform c ∈ C (unknown), given |ψc ⊗T , identify c

slide-92
SLIDE 92

13/ 18

Pretty good measurement for state identification

Consider a concept class C consisting of n-bit Boolean functions. Let D : {0, 1}n → [0, 1] be a distribution For c ∈ C, a quantum example is |ψc =

x∈{0,1}n

  • D(x) |x, c(x)

State identification: For uniform c ∈ C (unknown), given |ψc ⊗T , identify c Optimal measurement could be quite complicated,

slide-93
SLIDE 93

13/ 18

Pretty good measurement for state identification

Consider a concept class C consisting of n-bit Boolean functions. Let D : {0, 1}n → [0, 1] be a distribution For c ∈ C, a quantum example is |ψc =

x∈{0,1}n

  • D(x) |x, c(x)

State identification: For uniform c ∈ C (unknown), given |ψc ⊗T , identify c Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement (PGM)

slide-94
SLIDE 94

13/ 18

Pretty good measurement for state identification

Consider a concept class C consisting of n-bit Boolean functions. Let D : {0, 1}n → [0, 1] be a distribution For c ∈ C, a quantum example is |ψc =

x∈{0,1}n

  • D(x) |x, c(x)

State identification: For uniform c ∈ C (unknown), given |ψc ⊗T , identify c Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement (PGM) If Popt is the success probability of the optimal measurement,

slide-95
SLIDE 95

13/ 18

Pretty good measurement for state identification

Consider a concept class C consisting of n-bit Boolean functions. Let D : {0, 1}n → [0, 1] be a distribution For c ∈ C, a quantum example is |ψc =

x∈{0,1}n

  • D(x) |x, c(x)

State identification: For uniform c ∈ C (unknown), given |ψc ⊗T , identify c Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement (PGM) If Popt is the success probability of the optimal measurement, Ppgm is the success probability of the PGM,

slide-96
SLIDE 96

13/ 18

Pretty good measurement for state identification

Consider a concept class C consisting of n-bit Boolean functions. Let D : {0, 1}n → [0, 1] be a distribution For c ∈ C, a quantum example is |ψc =

x∈{0,1}n

  • D(x) |x, c(x)

State identification: For uniform c ∈ C (unknown), given |ψc ⊗T , identify c Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement (PGM) If Popt is the success probability of the optimal measurement, Ppgm is the success probability of the PGM, then Popt ≥ Ppgm

slide-97
SLIDE 97

13/ 18

Pretty good measurement for state identification

Consider a concept class C consisting of n-bit Boolean functions. Let D : {0, 1}n → [0, 1] be a distribution For c ∈ C, a quantum example is |ψc =

x∈{0,1}n

  • D(x) |x, c(x)

State identification: For uniform c ∈ C (unknown), given |ψc ⊗T , identify c Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement (PGM) If Popt is the success probability of the optimal measurement, Ppgm is the success probability of the PGM, then Popt ≥ Ppgm ≥ P2

  • pt (Barnum-Knill’02)
slide-98
SLIDE 98

14/ 18

Quantum examples help the coupon collector

slide-99
SLIDE 99

14/ 18

Quantum examples help the coupon collector

Standard coupon collector Problem: Suppose there are N coupons.

slide-100
SLIDE 100

14/ 18

Quantum examples help the coupon collector

Standard coupon collector Problem: Suppose there are N coupons. How many coupons to draw (with replacement) before having seen each coupon at least once?

slide-101
SLIDE 101

14/ 18

Quantum examples help the coupon collector

Standard coupon collector Problem: Suppose there are N coupons. How many coupons to draw (with replacement) before having seen each coupon at least once? Answer: Simple probability analysis shows Θ(N log N)

slide-102
SLIDE 102

14/ 18

Quantum examples help the coupon collector

Standard coupon collector Problem: Suppose there are N coupons. How many coupons to draw (with replacement) before having seen each coupon at least once? Answer: Simple probability analysis shows Θ(N log N) Variation to coupon collector Problem: Suppose there are N coupons.

slide-103
SLIDE 103

14/ 18

Quantum examples help the coupon collector

Standard coupon collector Problem: Suppose there are N coupons. How many coupons to draw (with replacement) before having seen each coupon at least once? Answer: Simple probability analysis shows Θ(N log N) Variation to coupon collector Problem: Suppose there are N coupons. Fix unknown i∗ ∈ {1, . . . , N}.

slide-104
SLIDE 104

14/ 18

Quantum examples help the coupon collector

Standard coupon collector Problem: Suppose there are N coupons. How many coupons to draw (with replacement) before having seen each coupon at least once? Answer: Simple probability analysis shows Θ(N log N) Variation to coupon collector Problem: Suppose there are N coupons. Fix unknown i∗ ∈ {1, . . . , N}. How many coupons to draw (with replacement) from {1, . . . , N}\{i∗}

slide-105
SLIDE 105

14/ 18

Quantum examples help the coupon collector

Standard coupon collector Problem: Suppose there are N coupons. How many coupons to draw (with replacement) before having seen each coupon at least once? Answer: Simple probability analysis shows Θ(N log N) Variation to coupon collector Problem: Suppose there are N coupons. Fix unknown i∗ ∈ {1, . . . , N}. How many coupons to draw (with replacement) from {1, . . . , N}\{i∗} before learning i∗?

slide-106
SLIDE 106

14/ 18

Quantum examples help the coupon collector

Standard coupon collector Problem: Suppose there are N coupons. How many coupons to draw (with replacement) before having seen each coupon at least once? Answer: Simple probability analysis shows Θ(N log N) Variation to coupon collector Problem: Suppose there are N coupons. Fix unknown i∗ ∈ {1, . . . , N}. How many coupons to draw (with replacement) from {1, . . . , N}\{i∗} before learning i∗? Answer: Same analysis as earlier shows Θ(N log N)

slide-107
SLIDE 107

14/ 18

Quantum examples help the coupon collector

Standard coupon collector Problem: Suppose there are N coupons. How many coupons to draw (with replacement) before having seen each coupon at least once? Answer: Simple probability analysis shows Θ(N log N) Variation to coupon collector Problem: Suppose there are N coupons. Fix unknown i∗ ∈ {1, . . . , N}. How many coupons to draw (with replacement) from {1, . . . , N}\{i∗} before learning i∗? Answer: Same analysis as earlier shows Θ(N log N) What if we are given “quantum examples”

slide-108
SLIDE 108

14/ 18

Quantum examples help the coupon collector

Standard coupon collector Problem: Suppose there are N coupons. How many coupons to draw (with replacement) before having seen each coupon at least once? Answer: Simple probability analysis shows Θ(N log N) Variation to coupon collector Problem: Suppose there are N coupons. Fix unknown i∗ ∈ {1, . . . , N}. How many coupons to draw (with replacement) from {1, . . . , N}\{i∗} before learning i∗? Answer: Same analysis as earlier shows Θ(N log N) What if we are given “quantum examples” Suppose a quantum learner obtains quantum examples

1 √N−1

  • i∈({1,...,N}\{i∗}) |i.
slide-109
SLIDE 109

14/ 18

Quantum examples help the coupon collector

Standard coupon collector Problem: Suppose there are N coupons. How many coupons to draw (with replacement) before having seen each coupon at least once? Answer: Simple probability analysis shows Θ(N log N) Variation to coupon collector Problem: Suppose there are N coupons. Fix unknown i∗ ∈ {1, . . . , N}. How many coupons to draw (with replacement) from {1, . . . , N}\{i∗} before learning i∗? Answer: Same analysis as earlier shows Θ(N log N) What if we are given “quantum examples” Suppose a quantum learner obtains quantum examples

1 √N−1

  • i∈({1,...,N}\{i∗}) |i.

How many quantum examples before learning i∗?

slide-110
SLIDE 110

14/ 18

Quantum examples help the coupon collector

Standard coupon collector Problem: Suppose there are N coupons. How many coupons to draw (with replacement) before having seen each coupon at least once? Answer: Simple probability analysis shows Θ(N log N) Variation to coupon collector Problem: Suppose there are N coupons. Fix unknown i∗ ∈ {1, . . . , N}. How many coupons to draw (with replacement) from {1, . . . , N}\{i∗} before learning i∗? Answer: Same analysis as earlier shows Θ(N log N) What if we are given “quantum examples” Suppose a quantum learner obtains quantum examples

1 √N−1

  • i∈({1,...,N}\{i∗}) |i.

How many quantum examples before learning i∗? Answer [ACKW’..]: Can learn i∗ using Θ(N) quantum examples

slide-111
SLIDE 111

14/ 18

Quantum examples help the coupon collector

Standard coupon collector Problem: Suppose there are N coupons. How many coupons to draw (with replacement) before having seen each coupon at least once? Answer: Simple probability analysis shows Θ(N log N) Variation to coupon collector Problem: Suppose there are N coupons. Fix unknown i∗ ∈ {1, . . . , N}. How many coupons to draw (with replacement) from {1, . . . , N}\{i∗} before learning i∗? Answer: Same analysis as earlier shows Θ(N log N) What if we are given “quantum examples” Suppose a quantum learner obtains quantum examples

1 √N−1

  • i∈({1,...,N}\{i∗}) |i.

How many quantum examples before learning i∗? Answer [ACKW’..]: Can learn i∗ using Θ(N) quantum examples Proof idea: Analyze the success probability using the pretty good measurement.

slide-112
SLIDE 112

14/ 18

Quantum examples help the coupon collector

Standard coupon collector Problem: Suppose there are N coupons. How many coupons to draw (with replacement) before having seen each coupon at least once? Answer: Simple probability analysis shows Θ(N log N) Variation to coupon collector Problem: Suppose there are N coupons. Fix unknown i∗ ∈ {1, . . . , N}. How many coupons to draw (with replacement) from {1, . . . , N}\{i∗} before learning i∗? Answer: Same analysis as earlier shows Θ(N log N) What if we are given “quantum examples” Suppose a quantum learner obtains quantum examples

1 √N−1

  • i∈({1,...,N}\{i∗}) |i.

How many quantum examples before learning i∗? Answer [ACKW’..]: Can learn i∗ using Θ(N) quantum examples Proof idea: Analyze the success probability using the pretty good measurement. If T = O(N), then Popt ≥ Ppgm ≥ 2/3

slide-113
SLIDE 113

15/ 18

Distribution-independent learning

Recall: PAC learning

slide-114
SLIDE 114

15/ 18

Distribution-independent learning

Recall: PAC learning Given (x, c(x)) examples where x ∼ D, a learner (ε, δ)-PAC-learns C if: ∀D ∀c ∈ C : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

slide-115
SLIDE 115

15/ 18

Distribution-independent learning

Recall: PAC learning Given (x, c(x)) examples where x ∼ D, a learner (ε, δ)-PAC-learns C if: ∀D ∀c ∈ C : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

Complexity measure: Number of labelled examples

slide-116
SLIDE 116

15/ 18

Distribution-independent learning

Recall: PAC learning Given (x, c(x)) examples where x ∼ D, a learner (ε, δ)-PAC-learns C if: ∀D ∀c ∈ C : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

Complexity measure: Number of labelled examples For a concept class C, associate a combinatorial parameter called VC-dimension of C.

slide-117
SLIDE 117

15/ 18

Distribution-independent learning

Recall: PAC learning Given (x, c(x)) examples where x ∼ D, a learner (ε, δ)-PAC-learns C if: ∀D ∀c ∈ C : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

Complexity measure: Number of labelled examples For a concept class C, associate a combinatorial parameter called VC-dimension of C. Classical PAC learning sample complexity is characterized by the VC-dimension of C

slide-118
SLIDE 118

15/ 18

Distribution-independent learning

Recall: PAC learning Given (x, c(x)) examples where x ∼ D, a learner (ε, δ)-PAC-learns C if: ∀D ∀c ∈ C : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

Complexity measure: Number of labelled examples For a concept class C, associate a combinatorial parameter called VC-dimension of C. Classical PAC learning sample complexity is characterized by the VC-dimension of C Fundamental theorem of PAC learning

slide-119
SLIDE 119

15/ 18

Distribution-independent learning

Recall: PAC learning Given (x, c(x)) examples where x ∼ D, a learner (ε, δ)-PAC-learns C if: ∀D ∀c ∈ C : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

Complexity measure: Number of labelled examples For a concept class C, associate a combinatorial parameter called VC-dimension of C. Classical PAC learning sample complexity is characterized by the VC-dimension of C Fundamental theorem of PAC learning Suppose VC-dim(C) = d

slide-120
SLIDE 120

15/ 18

Distribution-independent learning

Recall: PAC learning Given (x, c(x)) examples where x ∼ D, a learner (ε, δ)-PAC-learns C if: ∀D ∀c ∈ C : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

Complexity measure: Number of labelled examples For a concept class C, associate a combinatorial parameter called VC-dimension of C. Classical PAC learning sample complexity is characterized by the VC-dimension of C Fundamental theorem of PAC learning Suppose VC-dim(C) = d Blumer-Ehrenfeucht-Haussler-Warmuth’86: every (ε, δ)-PAC learner for C needs Ω

  • d

ε + log(1/δ) ε

  • examples
slide-121
SLIDE 121

15/ 18

Distribution-independent learning

Recall: PAC learning Given (x, c(x)) examples where x ∼ D, a learner (ε, δ)-PAC-learns C if: ∀D ∀c ∈ C : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

Complexity measure: Number of labelled examples For a concept class C, associate a combinatorial parameter called VC-dimension of C. Classical PAC learning sample complexity is characterized by the VC-dimension of C Fundamental theorem of PAC learning Suppose VC-dim(C) = d Blumer-Ehrenfeucht-Haussler-Warmuth’86: every (ε, δ)-PAC learner for C needs Ω

  • d

ε + log(1/δ) ε

  • examples

Hanneke’16: exists an (ε, δ)-PAC learner for C using O

  • d

ε + log(1/δ) ε

  • examples
slide-122
SLIDE 122

16/ 18

VC-dimension and quantum sample complexity

slide-123
SLIDE 123

16/ 18

VC-dimension and quantum sample complexity

Quantum bounds Classical upper bound O

  • d

ε + log(1/δ) ε

  • carries over to quantum
slide-124
SLIDE 124

16/ 18

VC-dimension and quantum sample complexity

Quantum bounds Classical upper bound O

  • d

ε + log(1/δ) ε

  • carries over to quantum

Atıcı-Servedio’04: lower bound Ω √

d ε + log(1/δ) ε

slide-125
SLIDE 125

16/ 18

VC-dimension and quantum sample complexity

Quantum bounds Classical upper bound O

  • d

ε + log(1/δ) ε

  • carries over to quantum

Atıcı-Servedio’04: lower bound Ω √

d ε + log(1/δ) ε

  • AW’17: Showed Ω
  • d

ε + log(1/δ) ε

  • quantum examples are necessary
slide-126
SLIDE 126

16/ 18

VC-dimension and quantum sample complexity

Quantum bounds Classical upper bound O

  • d

ε + log(1/δ) ε

  • carries over to quantum

Atıcı-Servedio’04: lower bound Ω √

d ε + log(1/δ) ε

  • AW’17: Showed Ω
  • d

ε + log(1/δ) ε

  • quantum examples are necessary

Proof idea: Reduce to state identification.

slide-127
SLIDE 127

16/ 18

VC-dimension and quantum sample complexity

Quantum bounds Classical upper bound O

  • d

ε + log(1/δ) ε

  • carries over to quantum

Atıcı-Servedio’04: lower bound Ω √

d ε + log(1/δ) ε

  • AW’17: Showed Ω
  • d

ε + log(1/δ) ε

  • quantum examples are necessary

Proof idea: Reduce to state identification. For a good learner Popt ≥ 2/3, so Ppgm ≥ P2

  • pt ≥ 4/9.
slide-128
SLIDE 128

16/ 18

VC-dimension and quantum sample complexity

Quantum bounds Classical upper bound O

  • d

ε + log(1/δ) ε

  • carries over to quantum

Atıcı-Servedio’04: lower bound Ω √

d ε + log(1/δ) ε

  • AW’17: Showed Ω
  • d

ε + log(1/δ) ε

  • quantum examples are necessary

Proof idea: Reduce to state identification. For a good learner Popt ≥ 2/3, so Ppgm ≥ P2

  • pt ≥ 4/9. If Ppgm ≥ 4/9, then T = Ω
  • d

ε

slide-129
SLIDE 129

16/ 18

VC-dimension and quantum sample complexity

Quantum bounds Classical upper bound O

  • d

ε + log(1/δ) ε

  • carries over to quantum

Atıcı-Servedio’04: lower bound Ω √

d ε + log(1/δ) ε

  • AW’17: Showed Ω
  • d

ε + log(1/δ) ε

  • quantum examples are necessary

Proof idea: Reduce to state identification. For a good learner Popt ≥ 2/3, so Ppgm ≥ P2

  • pt ≥ 4/9. If Ppgm ≥ 4/9, then T = Ω
  • d

ε

  • Quantum examples are no better than classical examples for PAC learning
slide-130
SLIDE 130

16/ 18

VC-dimension and quantum sample complexity

Quantum bounds Classical upper bound O

  • d

ε + log(1/δ) ε

  • carries over to quantum

Atıcı-Servedio’04: lower bound Ω √

d ε + log(1/δ) ε

  • AW’17: Showed Ω
  • d

ε + log(1/δ) ε

  • quantum examples are necessary

Proof idea: Reduce to state identification. For a good learner Popt ≥ 2/3, so Ppgm ≥ P2

  • pt ≥ 4/9. If Ppgm ≥ 4/9, then T = Ω
  • d

ε

  • Quantum examples are no better than classical examples for PAC learning

Let’s get real! In computational learning theory, agnostic learning and learning under classification noise is a theoretical way to model noise in data

slide-131
SLIDE 131

16/ 18

VC-dimension and quantum sample complexity

Quantum bounds Classical upper bound O

  • d

ε + log(1/δ) ε

  • carries over to quantum

Atıcı-Servedio’04: lower bound Ω √

d ε + log(1/δ) ε

  • AW’17: Showed Ω
  • d

ε + log(1/δ) ε

  • quantum examples are necessary

Proof idea: Reduce to state identification. For a good learner Popt ≥ 2/3, so Ppgm ≥ P2

  • pt ≥ 4/9. If Ppgm ≥ 4/9, then T = Ω
  • d

ε

  • Quantum examples are no better than classical examples for PAC learning

Let’s get real! In computational learning theory, agnostic learning and learning under classification noise is a theoretical way to model noise in data Again, in these realistic models we show that quantum sample complexity equals classical sample complexity

slide-132
SLIDE 132

17/ 18

Future directions

slide-133
SLIDE 133

17/ 18

Future directions

More mileage out of Fourier sampling?

slide-134
SLIDE 134

17/ 18

Future directions

More mileage out of Fourier sampling? Extend result of Bshouty-Jackson from depth-2 circuits (i.e., DNFs) to depth-3?

slide-135
SLIDE 135

17/ 18

Future directions

More mileage out of Fourier sampling? Extend result of Bshouty-Jackson from depth-2 circuits (i.e., DNFs) to depth-3? Can we PAC-learn DNFs? If so, then we could possibly learn depth-3 circuits under the uniform distribution

slide-136
SLIDE 136

17/ 18

Future directions

More mileage out of Fourier sampling? Extend result of Bshouty-Jackson from depth-2 circuits (i.e., DNFs) to depth-3? Can we PAC-learn DNFs? If so, then we could possibly learn depth-3 circuits under the uniform distribution Scott Aaronson: Can AC0 be learnt in quantum polynomial time? (One of his ten semi-grand challenges for quantum computing!)

slide-137
SLIDE 137

17/ 18

Future directions

More mileage out of Fourier sampling? Extend result of Bshouty-Jackson from depth-2 circuits (i.e., DNFs) to depth-3? Can we PAC-learn DNFs? If so, then we could possibly learn depth-3 circuits under the uniform distribution Scott Aaronson: Can AC0 be learnt in quantum polynomial time? (One of his ten semi-grand challenges for quantum computing!) Can TC0 be learnt in quantum polynomial time? A theoretical way to understand neural networks

slide-138
SLIDE 138

17/ 18

Future directions

More mileage out of Fourier sampling? Extend result of Bshouty-Jackson from depth-2 circuits (i.e., DNFs) to depth-3? Can we PAC-learn DNFs? If so, then we could possibly learn depth-3 circuits under the uniform distribution Scott Aaronson: Can AC0 be learnt in quantum polynomial time? (One of his ten semi-grand challenges for quantum computing!) Can TC0 be learnt in quantum polynomial time? A theoretical way to understand neural networks Can we learn constant-depth quantum circuits?

slide-139
SLIDE 139

17/ 18

Future directions

More mileage out of Fourier sampling? Extend result of Bshouty-Jackson from depth-2 circuits (i.e., DNFs) to depth-3? Can we PAC-learn DNFs? If so, then we could possibly learn depth-3 circuits under the uniform distribution Scott Aaronson: Can AC0 be learnt in quantum polynomial time? (One of his ten semi-grand challenges for quantum computing!) Can TC0 be learnt in quantum polynomial time? A theoretical way to understand neural networks Can we learn constant-depth quantum circuits? More open questions!

slide-140
SLIDE 140

17/ 18

Future directions

More mileage out of Fourier sampling? Extend result of Bshouty-Jackson from depth-2 circuits (i.e., DNFs) to depth-3? Can we PAC-learn DNFs? If so, then we could possibly learn depth-3 circuits under the uniform distribution Scott Aaronson: Can AC0 be learnt in quantum polynomial time? (One of his ten semi-grand challenges for quantum computing!) Can TC0 be learnt in quantum polynomial time? A theoretical way to understand neural networks Can we learn constant-depth quantum circuits? More open questions! Can we learn the concept class of k-Fourier sparse Boolean functions using O(k log k) samples matching our lower bound?

slide-141
SLIDE 141

17/ 18

Future directions

More mileage out of Fourier sampling? Extend result of Bshouty-Jackson from depth-2 circuits (i.e., DNFs) to depth-3? Can we PAC-learn DNFs? If so, then we could possibly learn depth-3 circuits under the uniform distribution Scott Aaronson: Can AC0 be learnt in quantum polynomial time? (One of his ten semi-grand challenges for quantum computing!) Can TC0 be learnt in quantum polynomial time? A theoretical way to understand neural networks Can we learn constant-depth quantum circuits? More open questions! Can we learn the concept class of k-Fourier sparse Boolean functions using O(k log k) samples matching our lower bound? Theoretically, one could consider more optimistic PAC-like models where learner need not succeed ∀c ∈ C and ∀D

slide-142
SLIDE 142

17/ 18

Future directions

More mileage out of Fourier sampling? Extend result of Bshouty-Jackson from depth-2 circuits (i.e., DNFs) to depth-3? Can we PAC-learn DNFs? If so, then we could possibly learn depth-3 circuits under the uniform distribution Scott Aaronson: Can AC0 be learnt in quantum polynomial time? (One of his ten semi-grand challenges for quantum computing!) Can TC0 be learnt in quantum polynomial time? A theoretical way to understand neural networks Can we learn constant-depth quantum circuits? More open questions! Can we learn the concept class of k-Fourier sparse Boolean functions using O(k log k) samples matching our lower bound? Theoretically, one could consider more optimistic PAC-like models where learner need not succeed ∀c ∈ C and ∀D Find more distributions (other than uniform) where quantum provides a speedup

slide-143
SLIDE 143

18/ 18

Conclusion

For PAC learning, quantum examples are no better than classical examples

slide-144
SLIDE 144

18/ 18

Conclusion

For PAC learning, quantum examples are no better than classical examples Under uniform D, quantum examples seem to help tremendously in some cases

slide-145
SLIDE 145

18/ 18

Conclusion

For PAC learning, quantum examples are no better than classical examples Under uniform D, quantum examples seem to help tremendously in some cases Quantum machine learning is still in its infancy! Not many strong examples where quantum significantly improves ML

slide-146
SLIDE 146

18/ 18

Conclusion

For PAC learning, quantum examples are no better than classical examples Under uniform D, quantum examples seem to help tremendously in some cases Quantum machine learning is still in its infancy! Not many strong examples where quantum significantly improves ML Many recent surveys on quantum machine learning.