Optimal Quantum Sample Complexity of Learning Algorithms - - PowerPoint PPT Presentation

optimal quantum sample complexity of learning algorithms
SMART_READER_LITE
LIVE PREVIEW

Optimal Quantum Sample Complexity of Learning Algorithms - - PowerPoint PPT Presentation

Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald de Wolf) 1/ 23 Machine learning Classical machine learning 2/ 23 Machine learning Classical machine learning Grand goal: enable AI


slide-1
SLIDE 1

1/ 23

Optimal Quantum Sample Complexity

  • f Learning Algorithms

Srinivasan Arunachalam

(Joint work with Ronald de Wolf)

slide-2
SLIDE 2

2/ 23

Machine learning

Classical machine learning

slide-3
SLIDE 3

2/ 23

Machine learning

Classical machine learning Grand goal: enable AI systems to improve themselves

slide-4
SLIDE 4

2/ 23

Machine learning

Classical machine learning Grand goal: enable AI systems to improve themselves Practical goal: learn“something” from given data

slide-5
SLIDE 5

2/ 23

Machine learning

Classical machine learning Grand goal: enable AI systems to improve themselves Practical goal: learn“something” from given data Recent success: deep learning is extremely good at image recognition, natural language processing, even the game of Go

slide-6
SLIDE 6

2/ 23

Machine learning

Classical machine learning Grand goal: enable AI systems to improve themselves Practical goal: learn“something” from given data Recent success: deep learning is extremely good at image recognition, natural language processing, even the game of Go Why the recent interest? Flood of available data, increasing computational power, growing progress in algorithms

slide-7
SLIDE 7

2/ 23

Machine learning

Classical machine learning Grand goal: enable AI systems to improve themselves Practical goal: learn“something” from given data Recent success: deep learning is extremely good at image recognition, natural language processing, even the game of Go Why the recent interest? Flood of available data, increasing computational power, growing progress in algorithms Quantum machine learning What can quantum computing do for machine learning?

slide-8
SLIDE 8

2/ 23

Machine learning

Classical machine learning Grand goal: enable AI systems to improve themselves Practical goal: learn“something” from given data Recent success: deep learning is extremely good at image recognition, natural language processing, even the game of Go Why the recent interest? Flood of available data, increasing computational power, growing progress in algorithms Quantum machine learning What can quantum computing do for machine learning? The learner will be quantum, the data may be quantum

slide-9
SLIDE 9

2/ 23

Machine learning

Classical machine learning Grand goal: enable AI systems to improve themselves Practical goal: learn“something” from given data Recent success: deep learning is extremely good at image recognition, natural language processing, even the game of Go Why the recent interest? Flood of available data, increasing computational power, growing progress in algorithms Quantum machine learning What can quantum computing do for machine learning? The learner will be quantum, the data may be quantum Some examples are known of reduction in time complexity:

clustering (A¨ ımeur et al. ’06) principal component analysis (Lloyd et al. ’13) perceptron learning (Wiebe et al. ’16) recommendation systems (Kerenidis & Prakash ’16)

slide-10
SLIDE 10

3/ 23

Probably Approximately Correct (PAC) learning

slide-11
SLIDE 11

3/ 23

Probably Approximately Correct (PAC) learning

Basic definitions Concept class C: collection of Boolean functions on n bits (Known)

slide-12
SLIDE 12

3/ 23

Probably Approximately Correct (PAC) learning

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C (Unknown)

slide-13
SLIDE 13

3/ 23

Probably Approximately Correct (PAC) learning

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C (Unknown) Distribution D : {0, 1}n → [0, 1] (Unknown)

slide-14
SLIDE 14

3/ 23

Probably Approximately Correct (PAC) learning

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C (Unknown) Distribution D : {0, 1}n → [0, 1] (Unknown) Labeled example for c ∈ C: (x, c(x)) where x ∼ D

slide-15
SLIDE 15

4/ 23

Probably Approximately Correct (PAC) learning

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C. (Unknown) Distribution D : {0, 1}n → [0, 1]. (Unknown) Labeled example for c ∈ C: (x, c(x)) where x ∼ D

slide-16
SLIDE 16

4/ 23

Probably Approximately Correct (PAC) learning

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C. (Unknown) Distribution D : {0, 1}n → [0, 1]. (Unknown) Labeled example for c ∈ C: (x, c(x)) where x ∼ D

slide-17
SLIDE 17

4/ 23

Probably Approximately Correct (PAC) learning

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C. (Unknown) Distribution D : {0, 1}n → [0, 1]. (Unknown) Labeled example for c ∈ C: (x, c(x)) where x ∼ D

slide-18
SLIDE 18

4/ 23

Probably Approximately Correct (PAC) learning

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C. (Unknown) Distribution D : {0, 1}n → [0, 1]. (Unknown) Labeled example for c ∈ C: (x, c(x)) where x ∼ D

slide-19
SLIDE 19

5/ 23

Probably Approximately Correct (PAC) learning

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C. (Unknown) Distribution D : {0, 1}n → [0, 1]. (Unknown) Labeled example for c ∈ C: (x, c(x)) where x ∼ D. Formally: A theory of the learnable (L.G. Valiant’84)

slide-20
SLIDE 20

5/ 23

Probably Approximately Correct (PAC) learning

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C. (Unknown) Distribution D : {0, 1}n → [0, 1]. (Unknown) Labeled example for c ∈ C: (x, c(x)) where x ∼ D. Formally: A theory of the learnable (L.G. Valiant’84) Using i.i.d. labeled examples, learner for C should output hypothesis h that is Probably Approximately Correct

slide-21
SLIDE 21

5/ 23

Probably Approximately Correct (PAC) learning

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C. (Unknown) Distribution D : {0, 1}n → [0, 1]. (Unknown) Labeled example for c ∈ C: (x, c(x)) where x ∼ D. Formally: A theory of the learnable (L.G. Valiant’84) Using i.i.d. labeled examples, learner for C should output hypothesis h that is Probably Approximately Correct Error of h w.r.t. target c: errD(c, h) = Prx∼D[c(x) = h(x)]

slide-22
SLIDE 22

5/ 23

Probably Approximately Correct (PAC) learning

Basic definitions Concept class C: collection of Boolean functions on n bits (Known) Target concept c: some function c ∈ C. (Unknown) Distribution D : {0, 1}n → [0, 1]. (Unknown) Labeled example for c ∈ C: (x, c(x)) where x ∼ D. Formally: A theory of the learnable (L.G. Valiant’84) Using i.i.d. labeled examples, learner for C should output hypothesis h that is Probably Approximately Correct Error of h w.r.t. target c: errD(c, h) = Prx∼D[c(x) = h(x)] An algorithm (ε, δ)-PAC-learns C if: ∀c ∈ C ∀D : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

slide-23
SLIDE 23

6/ 23

Complexity of learning

Recap Concept: some function c : {0, 1}n → {0, 1} Concept class C: set of concepts An algorithm (ε, δ)-PAC-learns C if: ∀c ∈ C ∀D : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

How to measure the efficiency of the learning algorithm?

slide-24
SLIDE 24

6/ 23

Complexity of learning

Recap Concept: some function c : {0, 1}n → {0, 1} Concept class C: set of concepts An algorithm (ε, δ)-PAC-learns C if: ∀c ∈ C ∀D : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner

slide-25
SLIDE 25

6/ 23

Complexity of learning

Recap Concept: some function c : {0, 1}n → {0, 1} Concept class C: set of concepts An algorithm (ε, δ)-PAC-learns C if: ∀c ∈ C ∀D : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner Time complexity: number of time-steps used by learner

slide-26
SLIDE 26

6/ 23

Complexity of learning

Recap Concept: some function c : {0, 1}n → {0, 1} Concept class C: set of concepts An algorithm (ε, δ)-PAC-learns C if: ∀c ∈ C ∀D : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

slide-27
SLIDE 27

6/ 23

Complexity of learning

Recap Concept: some function c : {0, 1}n → {0, 1} Concept class C: set of concepts An algorithm (ε, δ)-PAC-learns C if: ∀c ∈ C ∀D : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

No need for complexity-theoretic assumptions

slide-28
SLIDE 28

6/ 23

Complexity of learning

Recap Concept: some function c : {0, 1}n → {0, 1} Concept class C: set of concepts An algorithm (ε, δ)-PAC-learns C if: ∀c ∈ C ∀D : Pr[ errD(c, h) ≤ ε

  • Approximately Correct

] ≥ 1 − δ

Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

No need for complexity-theoretic assumptions No need to worry about the format of hypothesis h

slide-29
SLIDE 29

7/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ {c : {0, 1}n → {0, 1}}

slide-30
SLIDE 30

7/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ {c : {0, 1}n → {0, 1}} Let M be the |C| × 2n Boolean matrix whose c-th row is the truth table

  • f concept c : {0, 1}n → {0, 1}
slide-31
SLIDE 31

7/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ {c : {0, 1}n → {0, 1}} Let M be the |C| × 2n Boolean matrix whose c-th row is the truth table

  • f concept c : {0, 1}n → {0, 1}

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains {0, 1}d

slide-32
SLIDE 32

7/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ {c : {0, 1}n → {0, 1}} Let M be the |C| × 2n Boolean matrix whose c-th row is the truth table

  • f concept c : {0, 1}n → {0, 1}

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains {0, 1}d These d column indices are shattered by C

slide-33
SLIDE 33

8/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ {c : {0, 1}n → {0, 1}} M is the |C| × 2n Boolean matrix whose c-th row is the truth table of c VC-dim(C): largest d s.t. the |C| × d rectangle in M contains {0, 1}d These d column indices are shattered by C

Table : VC-dim(C) = 2

Concepts Truth table c1 1 1 c2 1 1 c3 1 1 c4 1 1 c5 1 1 1 c6 1 1 1 c7 1 1 c8 1 c9 1 1 1 1

slide-34
SLIDE 34

9/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ {c : {0, 1}n → {0, 1}} M is the |C| × 2n Boolean matrix whose c-th row is the truth table of c VC-dim(C): largest d s.t. the |C| × d rectangle in M contains {0, 1}d These d column indices are shattered by C

Table : VC-dim(C) = 2

Concepts Truth table c1 1 1 c2 1 1 c3 1 1 c4 1 1 c5 1 1 1 c6 1 1 1 c7 1 1 c8 1 c9 1 1 1 1

Table : VC-dim(C) = 3

Concepts Truth table c1 1 1 c2 1 1 c3 c4 1 1 1 c5 1 1 c6 1 1 1 c7 1 1 c8 1 1 c9 1

slide-35
SLIDE 35

10/ 23

VC dimension characterizes PAC sample complexity

VC dimension of C M is the |C| × 2n Boolean matrix whose c-th row is the truth table of c VC-dim(C): largest d s.t. the |C| × d rectangle in M contains {0, 1}d These d column indices are shattered by C Fundamental theorem of PAC learning

slide-36
SLIDE 36

10/ 23

VC dimension characterizes PAC sample complexity

VC dimension of C M is the |C| × 2n Boolean matrix whose c-th row is the truth table of c VC-dim(C): largest d s.t. the |C| × d rectangle in M contains {0, 1}d These d column indices are shattered by C Fundamental theorem of PAC learning Suppose VC-dim(C) = d

slide-37
SLIDE 37

10/ 23

VC dimension characterizes PAC sample complexity

VC dimension of C M is the |C| × 2n Boolean matrix whose c-th row is the truth table of c VC-dim(C): largest d s.t. the |C| × d rectangle in M contains {0, 1}d These d column indices are shattered by C Fundamental theorem of PAC learning Suppose VC-dim(C) = d Blumer-Ehrenfeucht-Haussler-Warmuth’86: every (ε, δ)-PAC learner for C needs Ω

  • d

ε + log(1/δ) ε

  • examples
slide-38
SLIDE 38

10/ 23

VC dimension characterizes PAC sample complexity

VC dimension of C M is the |C| × 2n Boolean matrix whose c-th row is the truth table of c VC-dim(C): largest d s.t. the |C| × d rectangle in M contains {0, 1}d These d column indices are shattered by C Fundamental theorem of PAC learning Suppose VC-dim(C) = d Blumer-Ehrenfeucht-Haussler-Warmuth’86: every (ε, δ)-PAC learner for C needs Ω

  • d

ε + log(1/δ) ε

  • examples

Hanneke’16: there exists an (ε, δ)-PAC learner for C using O

  • d

ε + log(1/δ) ε

  • examples
slide-39
SLIDE 39

11/ 23

Quantum PAC learning

(Bshouty-Jackson’95): Quantum generalization of classical PAC

slide-40
SLIDE 40

11/ 23

Quantum PAC learning

(Bshouty-Jackson’95): Quantum generalization of classical PAC Learner is quantum:

slide-41
SLIDE 41

11/ 23

Quantum PAC learning

(Bshouty-Jackson’95): Quantum generalization of classical PAC Learner is quantum: Data is quantum: Quantum example is a superposition

  • x∈{0,1}n
  • D(x) |x, c(x)
slide-42
SLIDE 42

11/ 23

Quantum PAC learning

(Bshouty-Jackson’95): Quantum generalization of classical PAC Learner is quantum: Data is quantum: Quantum example is a superposition

  • x∈{0,1}n
  • D(x) |x, c(x)

Measuring this state gives (x, c(x)) with probability D(x),

slide-43
SLIDE 43

11/ 23

Quantum PAC learning

(Bshouty-Jackson’95): Quantum generalization of classical PAC Learner is quantum: Data is quantum: Quantum example is a superposition

  • x∈{0,1}n
  • D(x) |x, c(x)

Measuring this state gives (x, c(x)) with probability D(x), so quantum examples are at least as powerful as classical

slide-44
SLIDE 44

12/ 23

Classical vs. Quantum PAC learning algorithm!

Question Can quantum sample complexity be significantly smaller than classical?

slide-45
SLIDE 45

13/ 23

Quantum PAC learning

Quantum Data Quantum example: |Ec,D =

x∈{0,1}n

  • D(x) |x, c(x)

Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for uniform distribution)

slide-46
SLIDE 46

13/ 23

Quantum PAC learning

Quantum Data Quantum example: |Ec,D =

x∈{0,1}n

  • D(x) |x, c(x)

Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for uniform distribution) Sample complexity: Learning class of linear functions

slide-47
SLIDE 47

13/ 23

Quantum PAC learning

Quantum Data Quantum example: |Ec,D =

x∈{0,1}n

  • D(x) |x, c(x)

Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for uniform distribution) Sample complexity: Learning class of linear functions Classical: Ω(n) classical examples needed Quantum: O(1) quantum examples suffice (Bernstein-Vazirani’93)

slide-48
SLIDE 48

13/ 23

Quantum PAC learning

Quantum Data Quantum example: |Ec,D =

x∈{0,1}n

  • D(x) |x, c(x)

Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for uniform distribution) Sample complexity: Learning class of linear functions Classical: Ω(n) classical examples needed Quantum: O(1) quantum examples suffice (Bernstein-Vazirani’93) Time complexity: Learning DNFs

slide-49
SLIDE 49

13/ 23

Quantum PAC learning

Quantum Data Quantum example: |Ec,D =

x∈{0,1}n

  • D(x) |x, c(x)

Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for uniform distribution) Sample complexity: Learning class of linear functions Classical: Ω(n) classical examples needed Quantum: O(1) quantum examples suffice (Bernstein-Vazirani’93) Time complexity: Learning DNFs Classical: Best known upper bound is quasi-poly. time (Verbeugt’90) Quantum: Polynomial-time (Bshouty-Jackson’95)

slide-50
SLIDE 50

14/ 23

Quantum PAC learning

Quantum Data Quantum example: |Ec,D =

x∈{0,1}n

  • D(x) |x, c(x)

Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for a fixed distribution) Learning class of linear functions under uniform D: Classical: Ω(n) classical examples needed Quantum: O(1) quantum examples suffice (Bernstein-Vazirani’93) Learning DNF under uniform D: Classical: Best known upper bound is quasi-poly. time (Verbeugt’90) Quantum Polynomial-time (Bshouty-Jackson’95) But in the PAC model, learner has to succeed for all D!

slide-51
SLIDE 51

15/ 23

Quantum sample complexity

Quantum upper bound Classical upper bound O

  • d

ε + log(1/δ) ε

  • carries over to quantum
slide-52
SLIDE 52

15/ 23

Quantum sample complexity

Quantum upper bound Classical upper bound O

  • d

ε + log(1/δ) ε

  • carries over to quantum

Best known quantum lower bounds Atici & Servedio’04: lower bound Ω √

d ε + d + log(1/δ) ε

slide-53
SLIDE 53

15/ 23

Quantum sample complexity

Quantum upper bound Classical upper bound O

  • d

ε + log(1/δ) ε

  • carries over to quantum

Best known quantum lower bounds Atici & Servedio’04: lower bound Ω √

d ε + d + log(1/δ) ε

  • Zhang’10: improved first term to d1−η

ε

for all η > 0

slide-54
SLIDE 54

16/ 23

Quantum sample complexity = Classical sample complexity

Quantum upper bound Classical upper bound O

  • d

ε + log(1/δ) ε

  • carries over to quantum

Best known quantum lower bounds Atici & Servedio’04: lower bound Ω √

d ε + d + log(1/δ) ε

  • Zhang’10 improved first term to d1−η

ε

for all η > 0 Our result: Tight lower bound We show: Ω

  • d

ε + log(1/δ) ε

  • quantum examples are necessary
slide-55
SLIDE 55

16/ 23

Quantum sample complexity = Classical sample complexity

Quantum upper bound Classical upper bound O

  • d

ε + log(1/δ) ε

  • carries over to quantum

Best known quantum lower bounds Atici & Servedio’04: lower bound Ω √

d ε + d + log(1/δ) ε

  • Zhang’10 improved first term to d1−η

ε

for all η > 0 Our result: Tight lower bound We show: Ω

  • d

ε + log(1/δ) ε

  • quantum examples are necessary

Two proof approaches Information theory: conceptually simple, nearly-tight bounds

slide-56
SLIDE 56

16/ 23

Quantum sample complexity = Classical sample complexity

Quantum upper bound Classical upper bound O

  • d

ε + log(1/δ) ε

  • carries over to quantum

Best known quantum lower bounds Atici & Servedio’04: lower bound Ω √

d ε + d + log(1/δ) ε

  • Zhang’10 improved first term to d1−η

ε

for all η > 0 Our result: Tight lower bound We show: Ω

  • d

ε + log(1/δ) ε

  • quantum examples are necessary

Two proof approaches Information theory: conceptually simple, nearly-tight bounds Optimal measurement: tight bounds, some messy calculations

slide-57
SLIDE 57

17/ 23

Proof approach

1

First, we consider the problem of probably exactly learning: quantum learner should identify the concept

slide-58
SLIDE 58

17/ 23

Proof approach

1

First, we consider the problem of probably exactly learning: quantum learner should identify the concept

2

Here, quantum learner is given one out of |C| quantum states. Identify the target concept using copies of the quantum state

slide-59
SLIDE 59

17/ 23

Proof approach

1

First, we consider the problem of probably exactly learning: quantum learner should identify the concept

2

Here, quantum learner is given one out of |C| quantum states. Identify the target concept using copies of the quantum state

3

Quantum state identification has been well-studied

slide-60
SLIDE 60

17/ 23

Proof approach

1

First, we consider the problem of probably exactly learning: quantum learner should identify the concept

2

Here, quantum learner is given one out of |C| quantum states. Identify the target concept using copies of the quantum state

3

Quantum state identification has been well-studied

4

We’ll get to probably approximately learning soon!

slide-61
SLIDE 61

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m]

slide-62
SLIDE 62

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m] Given state |ψz ∈ E with prob pz. Goal: identify z

slide-63
SLIDE 63

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m] Given state |ψz ∈ E with prob pz. Goal: identify z Optimal measurement could be quite complicated,

slide-64
SLIDE 64

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m] Given state |ψz ∈ E with prob pz. Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement

slide-65
SLIDE 65

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m] Given state |ψz ∈ E with prob pz. Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if Popt is the optimal success probability, then

slide-66
SLIDE 66

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m] Given state |ψz ∈ E with prob pz. Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if Popt is the optimal success probability, then Popt ≥ Ppgm

slide-67
SLIDE 67

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m] Given state |ψz ∈ E with prob pz. Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if Popt is the optimal success probability, then Popt ≥ Ppgm ≥ P2

  • pt (Barnum-Knill’02)
slide-68
SLIDE 68

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m] Given state |ψz ∈ E with prob pz. Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if Popt is the optimal success probability, then Popt ≥ Ppgm ≥ P2

  • pt (Barnum-Knill’02)

How does learning relate to identification?

slide-69
SLIDE 69

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m] Given state |ψz ∈ E with prob pz. Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if Popt is the optimal success probability, then Popt ≥ Ppgm ≥ P2

  • pt (Barnum-Knill’02)

How does learning relate to identification? Quantum PAC: Given |ψc = |Ec,D⊗T, learn c approximately

slide-70
SLIDE 70

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m] Given state |ψz ∈ E with prob pz. Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if Popt is the optimal success probability, then Popt ≥ Ppgm ≥ P2

  • pt (Barnum-Knill’02)

How does learning relate to identification? Quantum PAC: Given |ψc = |Ec,D⊗T, learn c approximately Let VC-dim(C) = d. Suppose {s0, . . . , sd} is shattered by C.

slide-71
SLIDE 71

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m] Given state |ψz ∈ E with prob pz. Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if Popt is the optimal success probability, then Popt ≥ Ppgm ≥ P2

  • pt (Barnum-Knill’02)

How does learning relate to identification? Quantum PAC: Given |ψc = |Ec,D⊗T, learn c approximately Let VC-dim(C) = d. Suppose {s0, . . . , sd} is shattered by C. Fix D(s0) = 1 − ε, D(si) = ε/d on {s1, . . . , sd}

slide-72
SLIDE 72

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m] Given state |ψz ∈ E with prob pz. Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if Popt is the optimal success probability, then Popt ≥ Ppgm ≥ P2

  • pt (Barnum-Knill’02)

How does learning relate to identification? Quantum PAC: Given |ψc = |Ec,D⊗T, learn c approximately Let VC-dim(C) = d. Suppose {s0, . . . , sd} is shattered by C. Fix D(s0) = 1 − ε, D(si) = ε/d on {s1, . . . , sd} Let k = Ω(d) and E : {0, 1}k → {0, 1}d be an error-correcting code

slide-73
SLIDE 73

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m] Given state |ψz ∈ E with prob pz. Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if Popt is the optimal success probability, then Popt ≥ Ppgm ≥ P2

  • pt (Barnum-Knill’02)

How does learning relate to identification? Quantum PAC: Given |ψc = |Ec,D⊗T, learn c approximately Let VC-dim(C) = d. Suppose {s0, . . . , sd} is shattered by C. Fix D(s0) = 1 − ε, D(si) = ε/d on {s1, . . . , sd} Let k = Ω(d) and E : {0, 1}k → {0, 1}d be an error-correcting code Pick 2k codeword concepts {cz}z∈{0,1}k ⊆ C:

slide-74
SLIDE 74

18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m] Given state |ψz ∈ E with prob pz. Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if Popt is the optimal success probability, then Popt ≥ Ppgm ≥ P2

  • pt (Barnum-Knill’02)

How does learning relate to identification? Quantum PAC: Given |ψc = |Ec,D⊗T, learn c approximately Let VC-dim(C) = d. Suppose {s0, . . . , sd} is shattered by C. Fix D(s0) = 1 − ε, D(si) = ε/d on {s1, . . . , sd} Let k = Ω(d) and E : {0, 1}k → {0, 1}d be an error-correcting code Pick 2k codeword concepts {cz}z∈{0,1}k ⊆ C: cz(s0) = 0, cz(si) = E(z)i ∀ i ∈ b

slide-75
SLIDE 75

19/ 23

Pick concepts {cz} ⊆ C: cz(s0) = 0, cz(si) = E(z)i ∀ i

Suppose VC(C) = d + 1 and {s0, . . . , sd} is shattered by C, i.e., |C| × (d + 1) rectangle of {s0, . . . , sd} contains {0, 1}d+1

slide-76
SLIDE 76

19/ 23

Pick concepts {cz} ⊆ C: cz(s0) = 0, cz(si) = E(z)i ∀ i

Suppose VC(C) = d + 1 and {s0, . . . , sd} is shattered by C, i.e., |C| × (d + 1) rectangle of {s0, . . . , sd} contains {0, 1}d+1 Concepts Truth table c ∈ C s0 s1 · · · sd−1 sd · · · · · · c1 · · · · · · · · · c2 · · · 1 · · · · · · c3 · · · 1 1 · · · · · · . . . . . . . . . ... . . . . . . · · · · · · c2d 1 · · · 1 1 · · · · · · c2d+1 1 · · · 1 · · · · · · . . . . . . . . . ... . . . . . . · · · · · · c2d+1 1 1 · · · 1 1 · · · · · · . . . . . . . . . ... . . . . . . · · · · · ·           

c(s0) = 0

slide-77
SLIDE 77

19/ 23

Pick concepts {cz} ⊆ C: cz(s0) = 0, cz(si) = E(z)i ∀ i

Suppose VC(C) = d + 1 and {s0, . . . , sd} is shattered by C, i.e., |C| × (d + 1) rectangle of {s0, . . . , sd} contains {0, 1}d+1 Concepts Truth table c ∈ C s0 s1 · · · sd−1 sd · · · · · · c1 · · · · · · · · · c2 · · · 1 · · · · · · c3 · · · 1 1 · · · · · · . . . . . . . . . ... . . . . . . · · · · · · c2d 1 · · · 1 1 · · · · · · c2d+1 1 · · · 1 · · · · · · . . . . . . . . . ... . . . . . . · · · · · · c2d+1 1 1 · · · 1 1 · · · · · · . . . . . . . . . ... . . . . . . · · · · · ·           

c(s0) = 0

Among {c1, . . . , c2d}, pick 2k concepts that correspond to codewords of E : {0, 1}k → {0, 1}d on {s1, . . . , sd}

slide-78
SLIDE 78

20/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m] Given state |ψz ∈ E with prob pz. Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement If Popt is the optimal success probability, then Popt ≥ Ppgm ≥ P2

  • pt

How does learning relate to identification? Quantum PAC: Given |ψc = |Ec,D⊗T, learn c approximately Let VC-dim(C) = d. Suppose {s0, . . . , sd} is shattered by C. Fix D : D(s0) = 1 − ε, D(si) = ε/d on {s1, . . . , sd} Let k = Ω(d) and E : {0, 1}k → {0, 1}d be an error-correcting code Pick 2k concepts {cz}z∈{0,1}k ⊆ C: cz(s0) = 0, cz(si) = E(z)i ∀ i

slide-79
SLIDE 79

20/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim(C)/ε

State identification: Ensemble E = {(pz, |ψz)}z∈[m] Given state |ψz ∈ E with prob pz. Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement If Popt is the optimal success probability, then Popt ≥ Ppgm ≥ P2

  • pt

How does learning relate to identification? Quantum PAC: Given |ψc = |Ec,D⊗T, learn c approximately Let VC-dim(C) = d. Suppose {s0, . . . , sd} is shattered by C. Fix D : D(s0) = 1 − ε, D(si) = ε/d on {s1, . . . , sd} Let k = Ω(d) and E : {0, 1}k → {0, 1}d be an error-correcting code Pick 2k concepts {cz}z∈{0,1}k ⊆ C: cz(s0) = 0, cz(si) = E(z)i ∀ i Learning cz approximately (wrt D) is equivalent to identifying z!

slide-80
SLIDE 80

21/ 23

Sample complexity lower bound via PGM

Recap Learning cz approximately (wrt D) is equivalent to identifying z!

slide-81
SLIDE 81

21/ 23

Sample complexity lower bound via PGM

Recap Learning cz approximately (wrt D) is equivalent to identifying z! If sample complexity is T, then there is a good learner that identifies z from |ψcz = |Ecz,D⊗T w.p. ≥ 1 − δ

slide-82
SLIDE 82

21/ 23

Sample complexity lower bound via PGM

Recap Learning cz approximately (wrt D) is equivalent to identifying z! If sample complexity is T, then there is a good learner that identifies z from |ψcz = |Ecz,D⊗T w.p. ≥ 1 − δ Goal: Show T ≥ d/ε

slide-83
SLIDE 83

21/ 23

Sample complexity lower bound via PGM

Recap Learning cz approximately (wrt D) is equivalent to identifying z! If sample complexity is T, then there is a good learner that identifies z from |ψcz = |Ecz,D⊗T w.p. ≥ 1 − δ Goal: Show T ≥ d/ε Analysis of PGM

slide-84
SLIDE 84

21/ 23

Sample complexity lower bound via PGM

Recap Learning cz approximately (wrt D) is equivalent to identifying z! If sample complexity is T, then there is a good learner that identifies z from |ψcz = |Ecz,D⊗T w.p. ≥ 1 − δ Goal: Show T ≥ d/ε Analysis of PGM For the ensemble {|ψcz : z ∈ {0, 1}k} with uniform probabilities pz = 1/2k, we have Ppgm

slide-85
SLIDE 85

21/ 23

Sample complexity lower bound via PGM

Recap Learning cz approximately (wrt D) is equivalent to identifying z! If sample complexity is T, then there is a good learner that identifies z from |ψcz = |Ecz,D⊗T w.p. ≥ 1 − δ Goal: Show T ≥ d/ε Analysis of PGM For the ensemble {|ψcz : z ∈ {0, 1}k} with uniform probabilities pz = 1/2k, we have Ppgm ≥ P2

  • pt ≥ (1 − δ)2
slide-86
SLIDE 86

21/ 23

Sample complexity lower bound via PGM

Recap Learning cz approximately (wrt D) is equivalent to identifying z! If sample complexity is T, then there is a good learner that identifies z from |ψcz = |Ecz,D⊗T w.p. ≥ 1 − δ Goal: Show T ≥ d/ε Analysis of PGM For the ensemble {|ψcz : z ∈ {0, 1}k} with uniform probabilities pz = 1/2k, we have Ppgm ≥ P2

  • pt ≥ (1 − δ)2

Recall k = Ω(d) because we used a good ECC

slide-87
SLIDE 87

21/ 23

Sample complexity lower bound via PGM

Recap Learning cz approximately (wrt D) is equivalent to identifying z! If sample complexity is T, then there is a good learner that identifies z from |ψcz = |Ecz,D⊗T w.p. ≥ 1 − δ Goal: Show T ≥ d/ε Analysis of PGM For the ensemble {|ψcz : z ∈ {0, 1}k} with uniform probabilities pz = 1/2k, we have Ppgm ≥ P2

  • pt ≥ (1 − δ)2

Recall k = Ω(d) because we used a good ECC Ppgm ≤

slide-88
SLIDE 88

21/ 23

Sample complexity lower bound via PGM

Recap Learning cz approximately (wrt D) is equivalent to identifying z! If sample complexity is T, then there is a good learner that identifies z from |ψcz = |Ecz,D⊗T w.p. ≥ 1 − δ Goal: Show T ≥ d/ε Analysis of PGM For the ensemble {|ψcz : z ∈ {0, 1}k} with uniform probabilities pz = 1/2k, we have Ppgm ≥ P2

  • pt ≥ (1 − δ)2

Recall k = Ω(d) because we used a good ECC Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d + √ Tdε − d − Tε)

slide-89
SLIDE 89

21/ 23

Sample complexity lower bound via PGM

Recap Learning cz approximately (wrt D) is equivalent to identifying z! If sample complexity is T, then there is a good learner that identifies z from |ψcz = |Ecz,D⊗T w.p. ≥ 1 − δ Goal: Show T ≥ d/ε Analysis of PGM For the ensemble {|ψcz : z ∈ {0, 1}k} with uniform probabilities pz = 1/2k, we have Ppgm ≥ P2

  • pt ≥ (1 − δ)2

Recall k = Ω(d) because we used a good ECC Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d + √ Tdε − d − Tε) This implies T = Ω(d/ε)

slide-90
SLIDE 90

22/ 23

Sample complexity lower bound via PGM

Recap Learning cz approximately (wrt D) is equivalent to identifying z! If sample complexity is T, then there is a good learner that identifies z from |ψcz = |Ecz,D⊗T with probability ≥ 1 − δ Analysis of PGM For the ensemble {|ψcz : z ∈ {0, 1}k} with uniform probabilities pz = 1/2k, we have Ppgm ≥ P2

  • pt ≥ (1 − δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d + √ Tdε − d − Tε) This implies T = Ω(d/ε)

slide-91
SLIDE 91

22/ 23

Sample complexity lower bound via PGM

Recap Learning cz approximately (wrt D) is equivalent to identifying z! If sample complexity is T, then there is a good learner that identifies z from |ψcz = |Ecz,D⊗T with probability ≥ 1 − δ Analysis of PGM For the ensemble {|ψcz : z ∈ {0, 1}k} with uniform probabilities pz = 1/2k, we have Ppgm ≥ P2

  • pt ≥ (1 − δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d + √ Tdε − d − Tε) This implies T = Ω(d/ε)

slide-92
SLIDE 92

22/ 23

Sample complexity lower bound via PGM

Recap Learning cz approximately (wrt D) is equivalent to identifying z! If sample complexity is T, then there is a good learner that identifies z from |ψcz = |Ecz,D⊗T with probability ≥ 1 − δ Analysis of PGM For the ensemble {|ψcz : z ∈ {0, 1}k} with uniform probabilities pz = 1/2k, we have Ppgm ≥ P2

  • pt ≥ (1 − δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d + √ Tdε − d − Tε) This implies T = Ω(d/ε)

slide-93
SLIDE 93

23/ 23

Conclusion and future work

Further results

slide-94
SLIDE 94

23/ 23

Conclusion and future work

Further results Agnostic learning: No quantum bounds known before (unlike PAC model).

slide-95
SLIDE 95

23/ 23

Conclusion and future work

Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity

slide-96
SLIDE 96

23/ 23

Conclusion and future work

Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity Also studied the model with random classification noise and show that quantum examples are no better than classical examples

slide-97
SLIDE 97

23/ 23

Conclusion and future work

Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity Also studied the model with random classification noise and show that quantum examples are no better than classical examples Future work

slide-98
SLIDE 98

23/ 23

Conclusion and future work

Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity Also studied the model with random classification noise and show that quantum examples are no better than classical examples Future work Quantum machine learning is still young!

slide-99
SLIDE 99

23/ 23

Conclusion and future work

Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity Also studied the model with random classification noise and show that quantum examples are no better than classical examples Future work Quantum machine learning is still young! Theoretically, one could consider more optimistic PAC-like models where learner need not succeed ∀c ∈ C and ∀D

slide-100
SLIDE 100

23/ 23

Conclusion and future work

Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity Also studied the model with random classification noise and show that quantum examples are no better than classical examples Future work Quantum machine learning is still young! Theoretically, one could consider more optimistic PAC-like models where learner need not succeed ∀c ∈ C and ∀D Efficient quantum PAC learnability of AC0 under uniform D?

slide-101
SLIDE 101

24/ 23

Buffer 1: Proof approach via Information theory

Suppose {s0, . . . , sd} is shattered by C. By definition: ∀a ∈ {0, 1}d ∃c ∈ C s.t. c(s0) = 0, and c(si) = ai ∀ i ∈ [d] Fix a nasty distribution D: D(s0) = 1 − 4ε, D(si) = 4ε/d on {s1, . . . , sd}. Good learner produces hypothesis h s.t. h(si) = c(si) = ai for ≥ 3

4 of is

Think of c as uniform d-bit string A, approximated by h ∈ {0, 1}d that depends on examples B = (B1, . . . , BT)

1

I(A : B) ≥ I(A : h(B)) ≥ Ω(d) [because h ≈ A]

2

I(A : B) ≤ T

i=1 I(A : Bi) = T · I(A : B1)

[subadditivity]

3

I(A : B1) ≤ 4ε [because prob of useful example is 4ε]

This implies Ω(d) ≤ I(A : B) ≤ 4Tε, hence T = Ω( d

ε )

For analyzing quantum examples, only step 3 changes: I(A : B1) ≤ O(ε log(d/ε)) ⇒ T = Ω( d

ε 1 log(d/ε))

slide-102
SLIDE 102

25/ 23

Buffer 2: Proof approach in detail

Suppose we’re given state |ψi with prob pi, i = 1, . . . , m. Goal: learn i Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement. This has POVM operators Mi = piρ−1/2|ψiψi|ρ−1/2, where ρ =

i pi|ψiψi|

Success probability of PGM: PPGM =

i piTr(Mi|ψiψi|)

Crucial property (BK’02): if POPT is the success probablity of the

  • ptimal POVM, then POPT ≥ PPGM ≥ P2

OPT

Let G be the m × m Gram matrix of the vectors √pi |ψi, then PPGM =

i

√ G(i, i)2

slide-103
SLIDE 103

26/ 23

Buffer 3: Analysis of PGM

For the ensemble {|ψcz : z ∈ {0, 1}k} with uniform probabilities pz = 1/2k, we have PPGM ≥ (1 − δ)2 Let G be the 2k × 2k Gram matrix of the vectors √pz |ψcz , then PPGM =

z

√ G(z, z)2 Gxy = g(x ⊕ y). Can diagonalize G using Hadamard transform, and its eigenvalues will be 2k g(s). This gives √ G

  • z

√ G(z, z)2 ≤ · · · 4-page calculation · · · ≤ ≤ exp(T 2ε2/d + √ Tdε − d − Tε) This implies T = Ω(d/ε)