optimal quantum sample complexity of learning algorithms
play

Optimal Quantum Sample Complexity of Learning Algorithms - PowerPoint PPT Presentation

Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald de Wolf) 1/ 23 Machine learning Classical machine learning 2/ 23 Machine learning Classical machine learning Grand goal: enable AI


  1. Complexity of learning Recap Concept: some function c : { 0 , 1 } n → { 0 , 1 } Concept class C : set of concepts An algorithm ( ε, δ )-PAC-learns C if: ∀ c ∈ C ∀ D : Pr[ err D ( c , h ) ≤ ε ] ≥ 1 − δ � �� � � �� � Probably Approximately Correct How to measure the efficiency of the learning algorithm? Sample complexity: number of labeled examples used by learner Time complexity: number of time-steps used by learner This talk: focus on sample complexity No need for complexity-theoretic assumptions No need to worry about the format of hypothesis h 6/ 23

  2. Vapnik and Chervonenkis (VC) dimension VC dimension of C ⊆ { c : { 0 , 1 } n → { 0 , 1 }} 7/ 23

  3. Vapnik and Chervonenkis (VC) dimension VC dimension of C ⊆ { c : { 0 , 1 } n → { 0 , 1 }} Let M be the |C| × 2 n Boolean matrix whose c -th row is the truth table of concept c : { 0 , 1 } n → { 0 , 1 } 7/ 23

  4. Vapnik and Chervonenkis (VC) dimension VC dimension of C ⊆ { c : { 0 , 1 } n → { 0 , 1 }} Let M be the |C| × 2 n Boolean matrix whose c -th row is the truth table of concept c : { 0 , 1 } n → { 0 , 1 } VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d 7/ 23

  5. Vapnik and Chervonenkis (VC) dimension VC dimension of C ⊆ { c : { 0 , 1 } n → { 0 , 1 }} Let M be the |C| × 2 n Boolean matrix whose c -th row is the truth table of concept c : { 0 , 1 } n → { 0 , 1 } VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d These d column indices are shattered by C 7/ 23

  6. Vapnik and Chervonenkis (VC) dimension VC dimension of C ⊆ { c : { 0 , 1 } n → { 0 , 1 }} M is the |C| × 2 n Boolean matrix whose c -th row is the truth table of c VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d These d column indices are shattered by C Table : VC-dim( C ) = 2 Concepts Truth table c 1 0 1 0 1 0 1 1 0 c 2 c 3 1 0 0 1 c 4 1 0 1 0 1 1 0 1 c 5 c 6 0 1 1 1 c 7 0 0 1 1 0 1 0 0 c 8 c 9 1 1 1 1 8/ 23

  7. Vapnik and Chervonenkis (VC) dimension VC dimension of C ⊆ { c : { 0 , 1 } n → { 0 , 1 }} M is the |C| × 2 n Boolean matrix whose c -th row is the truth table of c VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d These d column indices are shattered by C Table : VC-dim( C ) = 2 Table : VC-dim( C ) = 3 Concepts Truth table Concepts Truth table c 1 0 1 0 1 c 1 0 1 1 0 0 1 1 0 1 0 0 1 c 2 c 2 c 3 1 0 0 1 c 3 0 0 0 0 c 4 1 0 1 0 c 4 1 1 0 1 1 1 0 1 1 0 1 0 c 5 c 5 c 6 0 1 1 1 c 6 0 1 1 1 c 7 0 0 1 1 c 7 0 0 1 1 0 1 0 0 0 1 0 1 c 8 c 8 c 9 1 1 1 1 c 9 0 1 0 0 9/ 23

  8. VC dimension characterizes PAC sample complexity VC dimension of C M is the |C| × 2 n Boolean matrix whose c -th row is the truth table of c VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d These d column indices are shattered by C Fundamental theorem of PAC learning 10/ 23

  9. VC dimension characterizes PAC sample complexity VC dimension of C M is the |C| × 2 n Boolean matrix whose c -th row is the truth table of c VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d These d column indices are shattered by C Fundamental theorem of PAC learning Suppose VC-dim( C ) = d 10/ 23

  10. VC dimension characterizes PAC sample complexity VC dimension of C M is the |C| × 2 n Boolean matrix whose c -th row is the truth table of c VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d These d column indices are shattered by C Fundamental theorem of PAC learning Suppose VC-dim( C ) = d Blumer-Ehrenfeucht-Haussler-Warmuth’86: � � ε + log(1 /δ ) d every ( ε, δ )-PAC learner for C needs Ω examples ε 10/ 23

  11. VC dimension characterizes PAC sample complexity VC dimension of C M is the |C| × 2 n Boolean matrix whose c -th row is the truth table of c VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d These d column indices are shattered by C Fundamental theorem of PAC learning Suppose VC-dim( C ) = d Blumer-Ehrenfeucht-Haussler-Warmuth’86: � � ε + log(1 /δ ) d every ( ε, δ )-PAC learner for C needs Ω examples ε Hanneke’16: there exists an ( ε, δ )-PAC learner for C using � � ε + log(1 /δ ) d O examples ε 10/ 23

  12. Quantum PAC learning (Bshouty-Jackson’95): Quantum generalization of classical PAC 11/ 23

  13. Quantum PAC learning (Bshouty-Jackson’95): Quantum generalization of classical PAC Learner is quantum: 11/ 23

  14. Quantum PAC learning (Bshouty-Jackson’95): Quantum generalization of classical PAC Learner is quantum: Data is quantum: Quantum example is a superposition � � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n 11/ 23

  15. Quantum PAC learning (Bshouty-Jackson’95): Quantum generalization of classical PAC Learner is quantum: Data is quantum: Quantum example is a superposition � � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Measuring this state gives ( x , c ( x )) with probability D ( x ), 11/ 23

  16. Quantum PAC learning (Bshouty-Jackson’95): Quantum generalization of classical PAC Learner is quantum: Data is quantum: Quantum example is a superposition � � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Measuring this state gives ( x , c ( x )) with probability D ( x ), so quantum examples are at least as powerful as classical 11/ 23

  17. Classical vs. Quantum PAC learning algorithm! Question Can quantum sample complexity be significantly smaller than classical? 12/ 23

  18. Quantum PAC learning Quantum Data � Quantum example: | E c , D � = � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for uniform distribution ) 13/ 23

  19. Quantum PAC learning Quantum Data � Quantum example: | E c , D � = � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for uniform distribution ) Sample complexity: Learning class of linear functions 13/ 23

  20. Quantum PAC learning Quantum Data � Quantum example: | E c , D � = � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for uniform distribution ) Sample complexity: Learning class of linear functions Classical: Ω( n ) classical examples needed Quantum: O (1) quantum examples suffice (Bernstein-Vazirani’93) 13/ 23

  21. Quantum PAC learning Quantum Data � Quantum example: | E c , D � = � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for uniform distribution ) Sample complexity: Learning class of linear functions Classical: Ω( n ) classical examples needed Quantum: O (1) quantum examples suffice (Bernstein-Vazirani’93) Time complexity: Learning DNFs 13/ 23

  22. Quantum PAC learning Quantum Data � Quantum example: | E c , D � = � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for uniform distribution ) Sample complexity: Learning class of linear functions Classical: Ω( n ) classical examples needed Quantum: O (1) quantum examples suffice (Bernstein-Vazirani’93) Time complexity: Learning DNFs Classical: Best known upper bound is quasi-poly. time (Verbeugt’90) Quantum: Polynomial-time (Bshouty-Jackson’95) 13/ 23

  23. Quantum PAC learning Quantum Data � Quantum example: | E c , D � = � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for a fixed distribution) Learning class of linear functions under uniform D : Classical: Ω( n ) classical examples needed Quantum: O (1) quantum examples suffice (Bernstein-Vazirani’93) Learning DNF under uniform D : Classical: Best known upper bound is quasi-poly. time (Verbeugt’90) Quantum Polynomial-time (Bshouty-Jackson’95) But in the PAC model, learner has to succeed for all D ! 14/ 23

  24. Quantum sample complexity Quantum upper bound � � ε + log(1 /δ ) d Classical upper bound O carries over to quantum ε 15/ 23

  25. Quantum sample complexity Quantum upper bound � � ε + log(1 /δ ) d Classical upper bound O carries over to quantum ε Best known quantum lower bounds � √ � ε + d + log(1 /δ ) d Atici & Servedio’04: lower bound Ω ε 15/ 23

  26. Quantum sample complexity Quantum upper bound � � ε + log(1 /δ ) d Classical upper bound O carries over to quantum ε Best known quantum lower bounds � √ � ε + d + log(1 /δ ) d Atici & Servedio’04: lower bound Ω ε Zhang’10: improved first term to d 1 − η for all η > 0 ε 15/ 23

  27. Quantum sample complexity = Classical sample complexity Quantum upper bound � � ε + log(1 /δ ) d Classical upper bound O carries over to quantum ε Best known quantum lower bounds � √ � ε + d + log(1 /δ ) d Atici & Servedio’04: lower bound Ω ε Zhang’10 improved first term to d 1 − η for all η > 0 ε Our result: Tight lower bound � � ε + log(1 /δ ) d We show: Ω quantum examples are necessary ε 16/ 23

  28. Quantum sample complexity = Classical sample complexity Quantum upper bound � � ε + log(1 /δ ) d Classical upper bound O carries over to quantum ε Best known quantum lower bounds � √ � ε + d + log(1 /δ ) d Atici & Servedio’04: lower bound Ω ε Zhang’10 improved first term to d 1 − η for all η > 0 ε Our result: Tight lower bound � � ε + log(1 /δ ) d We show: Ω quantum examples are necessary ε Two proof approaches Information theory: conceptually simple, nearly-tight bounds 16/ 23

  29. Quantum sample complexity = Classical sample complexity Quantum upper bound � � ε + log(1 /δ ) d Classical upper bound O carries over to quantum ε Best known quantum lower bounds � √ � ε + d + log(1 /δ ) d Atici & Servedio’04: lower bound Ω ε Zhang’10 improved first term to d 1 − η for all η > 0 ε Our result: Tight lower bound � � ε + log(1 /δ ) d We show: Ω quantum examples are necessary ε Two proof approaches Information theory: conceptually simple, nearly-tight bounds Optimal measurement: tight bounds, some messy calculations 16/ 23

  30. Proof approach First, we consider the problem of probably exactly learning: 1 quantum learner should identify the concept 17/ 23

  31. Proof approach First, we consider the problem of probably exactly learning: 1 quantum learner should identify the concept Here, quantum learner is given one out of |C| quantum states. 2 Identify the target concept using copies of the quantum state 17/ 23

  32. Proof approach First, we consider the problem of probably exactly learning: 1 quantum learner should identify the concept Here, quantum learner is given one out of |C| quantum states. 2 Identify the target concept using copies of the quantum state Quantum state identification has been well-studied 3 17/ 23

  33. Proof approach First, we consider the problem of probably exactly learning: 1 quantum learner should identify the concept Here, quantum learner is given one out of |C| quantum states. 2 Identify the target concept using copies of the quantum state Quantum state identification has been well-studied 3 We’ll get to probably approximately learning soon! 4 17/ 23

  34. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] 18/ 23

  35. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z 18/ 23

  36. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, 18/ 23

  37. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement 18/ 23

  38. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then 18/ 23

  39. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm 18/ 23

  40. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) 18/ 23

  41. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) How does learning relate to identification? 18/ 23

  42. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately 18/ 23

  43. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately Let VC-dim( C ) = d . Suppose { s 0 , . . . , s d } is shattered by C . 18/ 23

  44. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately Let VC-dim( C ) = d . Suppose { s 0 , . . . , s d } is shattered by C . Fix D ( s 0 ) = 1 − ε , D ( s i ) = ε/ d on { s 1 , . . . , s d } 18/ 23

  45. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately Let VC-dim( C ) = d . Suppose { s 0 , . . . , s d } is shattered by C . Fix D ( s 0 ) = 1 − ε , D ( s i ) = ε/ d on { s 1 , . . . , s d } Let k = Ω( d ) and E : { 0 , 1 } k → { 0 , 1 } d be an error-correcting code 18/ 23

  46. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately Let VC-dim( C ) = d . Suppose { s 0 , . . . , s d } is shattered by C . Fix D ( s 0 ) = 1 − ε , D ( s i ) = ε/ d on { s 1 , . . . , s d } Let k = Ω( d ) and E : { 0 , 1 } k → { 0 , 1 } d be an error-correcting code Pick 2 k codeword concepts { c z } z ∈{ 0 , 1 } k ⊆ C : 18/ 23

  47. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately Let VC-dim( C ) = d . Suppose { s 0 , . . . , s d } is shattered by C . Fix D ( s 0 ) = 1 − ε , D ( s i ) = ε/ d on { s 1 , . . . , s d } Let k = Ω( d ) and E : { 0 , 1 } k → { 0 , 1 } d be an error-correcting code Pick 2 k codeword concepts { c z } z ∈{ 0 , 1 } k ⊆ C : c z ( s 0 ) = 0, c z ( s i ) = E ( z ) i ∀ i ∈ b 18/ 23

  48. Pick concepts { c z } ⊆ C : c z ( s 0 ) = 0, c z ( s i ) = E ( z ) i ∀ i Suppose VC ( C ) = d + 1 and { s 0 , . . . , s d } is shattered by C , i.e., |C| × ( d + 1) rectangle of { s 0 , . . . , s d } contains { 0 , 1 } d +1 19/ 23

  49. Pick concepts { c z } ⊆ C : c z ( s 0 ) = 0, c z ( s i ) = E ( z ) i ∀ i Suppose VC ( C ) = d + 1 and { s 0 , . . . , s d } is shattered by C , i.e., |C| × ( d + 1) rectangle of { s 0 , . . . , s d } contains { 0 , 1 } d +1 Concepts Truth table c ∈ C · · · · · · · · · s 0 s 1 s d − 1 s d  c 1 0 0 · · · 0 0 · · · · · ·    c 2 0 0 · · · 1 0 · · · · · ·   c 3 0 0 · · · 1 1 · · · · · · c ( s 0 ) = 0 . . . . . ...  . . . . .   . . . . . · · · · · ·   c 2 d 0 1 · · · 1 1 · · · · · · c 2 d +1 1 0 · · · 0 1 · · · · · · . . . . . ... . . . . . . . . . . · · · · · · 1 1 · · · 1 1 · · · · · · c 2 d +1 . . . . . ... . . . . . . . . . . · · · · · · 19/ 23

  50. Pick concepts { c z } ⊆ C : c z ( s 0 ) = 0, c z ( s i ) = E ( z ) i ∀ i Suppose VC ( C ) = d + 1 and { s 0 , . . . , s d } is shattered by C , i.e., |C| × ( d + 1) rectangle of { s 0 , . . . , s d } contains { 0 , 1 } d +1 Concepts Truth table c ∈ C · · · · · · · · · s 0 s 1 s d − 1 s d  c 1 0 0 · · · 0 0 · · · · · ·    c 2 0 0 · · · 1 0 · · · · · ·   c 3 0 0 · · · 1 1 · · · · · · c ( s 0 ) = 0 . . . . . ...  . . . . .   . . . . . · · · · · ·   c 2 d 0 1 · · · 1 1 · · · · · · c 2 d +1 1 0 · · · 0 1 · · · · · · . . . . . ... . . . . . . . . . . · · · · · · 1 1 · · · 1 1 · · · · · · c 2 d +1 . . . . . ... . . . . . . . . . . · · · · · · Among { c 1 , . . . , c 2 d } , pick 2 k concepts that correspond to codewords of E : { 0 , 1 } k → { 0 , 1 } d on { s 1 , . . . , s d } 19/ 23

  51. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement If P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately Let VC-dim( C ) = d . Suppose { s 0 , . . . , s d } is shattered by C . Fix D : D ( s 0 ) = 1 − ε , D ( s i ) = ε/ d on { s 1 , . . . , s d } Let k = Ω( d ) and E : { 0 , 1 } k → { 0 , 1 } d be an error-correcting code Pick 2 k concepts { c z } z ∈{ 0 , 1 } k ⊆ C : c z ( s 0 ) = 0, c z ( s i ) = E ( z ) i ∀ i 20/ 23

  52. Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement If P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately Let VC-dim( C ) = d . Suppose { s 0 , . . . , s d } is shattered by C . Fix D : D ( s 0 ) = 1 − ε , D ( s i ) = ε/ d on { s 1 , . . . , s d } Let k = Ω( d ) and E : { 0 , 1 } k → { 0 , 1 } d be an error-correcting code Pick 2 k concepts { c z } z ∈{ 0 , 1 } k ⊆ C : c z ( s 0 ) = 0, c z ( s i ) = E ( z ) i ∀ i Learning c z approximately (wrt D ) is equivalent to identifying z ! 20/ 23

  53. Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! 21/ 23

  54. Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ 21/ 23

  55. Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε 21/ 23

  56. Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε Analysis of PGM 21/ 23

  57. Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm 21/ 23

  58. Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm ≥ P 2 opt ≥ (1 − δ ) 2 21/ 23

  59. Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm ≥ P 2 opt ≥ (1 − δ ) 2 Recall k = Ω( d ) because we used a good ECC 21/ 23

  60. Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm ≥ P 2 opt ≥ (1 − δ ) 2 Recall k = Ω( d ) because we used a good ECC P pgm ≤ 21/ 23

  61. Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm ≥ P 2 opt ≥ (1 − δ ) 2 Recall k = Ω( d ) because we used a good ECC √ P pgm ≤ · · · 4-page calculation · · · ≤ exp( T 2 ε 2 / d + Td ε − d − T ε ) 21/ 23

  62. Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm ≥ P 2 opt ≥ (1 − δ ) 2 Recall k = Ω( d ) because we used a good ECC √ P pgm ≤ · · · 4-page calculation · · · ≤ exp( T 2 ε 2 / d + Td ε − d − T ε ) This implies T = Ω( d /ε ) 21/ 23

  63. Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T with probability ≥ 1 − δ Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm ≥ P 2 opt ≥ (1 − δ ) 2 √ P pgm ≤ · · · 4-page calculation · · · ≤ exp( T 2 ε 2 / d + Td ε − d − T ε ) This implies T = Ω( d /ε ) 22/ 23

  64. Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T with probability ≥ 1 − δ Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm ≥ P 2 opt ≥ (1 − δ ) 2 √ P pgm ≤ · · · 4-page calculation · · · ≤ exp( T 2 ε 2 / d + Td ε − d − T ε ) This implies T = Ω( d /ε ) 22/ 23

  65. Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T with probability ≥ 1 − δ Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm ≥ P 2 opt ≥ (1 − δ ) 2 √ P pgm ≤ · · · 4-page calculation · · · ≤ exp( T 2 ε 2 / d + Td ε − d − T ε ) This implies T = Ω( d /ε ) 22/ 23

  66. Conclusion and future work Further results 23/ 23

  67. Conclusion and future work Further results Agnostic learning: No quantum bounds known before (unlike PAC model). 23/ 23

  68. Conclusion and future work Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity 23/ 23

  69. Conclusion and future work Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity Also studied the model with random classification noise and show that quantum examples are no better than classical examples 23/ 23

  70. Conclusion and future work Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity Also studied the model with random classification noise and show that quantum examples are no better than classical examples Future work 23/ 23

  71. Conclusion and future work Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity Also studied the model with random classification noise and show that quantum examples are no better than classical examples Future work Quantum machine learning is still young! 23/ 23

  72. Conclusion and future work Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity Also studied the model with random classification noise and show that quantum examples are no better than classical examples Future work Quantum machine learning is still young! Theoretically, one could consider more optimistic PAC-like models where learner need not succeed ∀ c ∈ C and ∀ D 23/ 23

  73. Conclusion and future work Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity Also studied the model with random classification noise and show that quantum examples are no better than classical examples Future work Quantum machine learning is still young! Theoretically, one could consider more optimistic PAC-like models where learner need not succeed ∀ c ∈ C and ∀ D Efficient quantum PAC learnability of AC 0 under uniform D ? 23/ 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend