Optimal Quantum Sample Complexity of Learning Algorithms - PowerPoint PPT Presentation

Complexity of learning Recap Concept: some function c : { 0 , 1 } n → { 0 , 1 } Concept class C : set of concepts An algorithm ( ε, δ )-PAC-learns C if: ∀ c ∈ C ∀ D : Pr[ err D ( c , h ) ≤ ε ] ≥ 1 − δ � �� Probably Approximately Correct How to measure the efficiency of the learning algorithm? Sample complexity: number of labeled examples used by learner Time complexity: number of time-steps used by learner This talk: focus on sample complexity No need for complexity-theoretic assumptions No need to worry about the format of hypothesis h 6/ 23

Vapnik and Chervonenkis (VC) dimension VC dimension of C ⊆ { c : { 0 , 1 } n → { 0 , 1 }} 7/ 23

Vapnik and Chervonenkis (VC) dimension VC dimension of C ⊆ { c : { 0 , 1 } n → { 0 , 1 }} Let M be the |C| × 2 n Boolean matrix whose c -th row is the truth table of concept c : { 0 , 1 } n → { 0 , 1 } 7/ 23

Vapnik and Chervonenkis (VC) dimension VC dimension of C ⊆ { c : { 0 , 1 } n → { 0 , 1 }} Let M be the |C| × 2 n Boolean matrix whose c -th row is the truth table of concept c : { 0 , 1 } n → { 0 , 1 } VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d 7/ 23

Vapnik and Chervonenkis (VC) dimension VC dimension of C ⊆ { c : { 0 , 1 } n → { 0 , 1 }} Let M be the |C| × 2 n Boolean matrix whose c -th row is the truth table of concept c : { 0 , 1 } n → { 0 , 1 } VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d These d column indices are shattered by C 7/ 23

Vapnik and Chervonenkis (VC) dimension VC dimension of C ⊆ { c : { 0 , 1 } n → { 0 , 1 }} M is the |C| × 2 n Boolean matrix whose c -th row is the truth table of c VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d These d column indices are shattered by C Table : VC-dim( C ) = 2 Concepts Truth table c 1 0 1 0 1 0 1 1 0 c 2 c 3 1 0 0 1 c 4 1 0 1 0 1 1 0 1 c 5 c 6 0 1 1 1 c 7 0 0 1 1 0 1 0 0 c 8 c 9 1 1 1 1 8/ 23

Vapnik and Chervonenkis (VC) dimension VC dimension of C ⊆ { c : { 0 , 1 } n → { 0 , 1 }} M is the |C| × 2 n Boolean matrix whose c -th row is the truth table of c VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d These d column indices are shattered by C Table : VC-dim( C ) = 2 Table : VC-dim( C ) = 3 Concepts Truth table Concepts Truth table c 1 0 1 0 1 c 1 0 1 1 0 0 1 1 0 1 0 0 1 c 2 c 2 c 3 1 0 0 1 c 3 0 0 0 0 c 4 1 0 1 0 c 4 1 1 0 1 1 1 0 1 1 0 1 0 c 5 c 5 c 6 0 1 1 1 c 6 0 1 1 1 c 7 0 0 1 1 c 7 0 0 1 1 0 1 0 0 0 1 0 1 c 8 c 8 c 9 1 1 1 1 c 9 0 1 0 0 9/ 23

VC dimension characterizes PAC sample complexity VC dimension of C M is the |C| × 2 n Boolean matrix whose c -th row is the truth table of c VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d These d column indices are shattered by C Fundamental theorem of PAC learning 10/ 23

VC dimension characterizes PAC sample complexity VC dimension of C M is the |C| × 2 n Boolean matrix whose c -th row is the truth table of c VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d These d column indices are shattered by C Fundamental theorem of PAC learning Suppose VC-dim( C ) = d 10/ 23

VC dimension characterizes PAC sample complexity VC dimension of C M is the |C| × 2 n Boolean matrix whose c -th row is the truth table of c VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d These d column indices are shattered by C Fundamental theorem of PAC learning Suppose VC-dim( C ) = d Blumer-Ehrenfeucht-Haussler-Warmuth’86: � � ε + log(1 /δ ) d every ( ε, δ )-PAC learner for C needs Ω examples ε 10/ 23

VC dimension characterizes PAC sample complexity VC dimension of C M is the |C| × 2 n Boolean matrix whose c -th row is the truth table of c VC-dim( C ): largest d s.t. the |C| × d rectangle in M contains { 0 , 1 } d These d column indices are shattered by C Fundamental theorem of PAC learning Suppose VC-dim( C ) = d Blumer-Ehrenfeucht-Haussler-Warmuth’86: � � ε + log(1 /δ ) d every ( ε, δ )-PAC learner for C needs Ω examples ε Hanneke’16: there exists an ( ε, δ )-PAC learner for C using � � ε + log(1 /δ ) d O examples ε 10/ 23

Quantum PAC learning (Bshouty-Jackson’95): Quantum generalization of classical PAC 11/ 23

Quantum PAC learning (Bshouty-Jackson’95): Quantum generalization of classical PAC Learner is quantum: 11/ 23

Quantum PAC learning (Bshouty-Jackson’95): Quantum generalization of classical PAC Learner is quantum: Data is quantum: Quantum example is a superposition � � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n 11/ 23

Quantum PAC learning (Bshouty-Jackson’95): Quantum generalization of classical PAC Learner is quantum: Data is quantum: Quantum example is a superposition � � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Measuring this state gives ( x , c ( x )) with probability D ( x ), 11/ 23

Quantum PAC learning (Bshouty-Jackson’95): Quantum generalization of classical PAC Learner is quantum: Data is quantum: Quantum example is a superposition � � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Measuring this state gives ( x , c ( x )) with probability D ( x ), so quantum examples are at least as powerful as classical 11/ 23

Classical vs. Quantum PAC learning algorithm! Question Can quantum sample complexity be significantly smaller than classical? 12/ 23

Quantum PAC learning Quantum Data � Quantum example: | E c , D � = � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for uniform distribution ) 13/ 23

Quantum PAC learning Quantum Data � Quantum example: | E c , D � = � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for uniform distribution ) Sample complexity: Learning class of linear functions 13/ 23

Quantum PAC learning Quantum Data � Quantum example: | E c , D � = � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for uniform distribution ) Sample complexity: Learning class of linear functions Classical: Ω( n ) classical examples needed Quantum: O (1) quantum examples suffice (Bernstein-Vazirani’93) 13/ 23

Quantum PAC learning Quantum Data � Quantum example: | E c , D � = � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for uniform distribution ) Sample complexity: Learning class of linear functions Classical: Ω( n ) classical examples needed Quantum: O (1) quantum examples suffice (Bernstein-Vazirani’93) Time complexity: Learning DNFs 13/ 23

Quantum PAC learning Quantum Data � Quantum example: | E c , D � = � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for uniform distribution ) Sample complexity: Learning class of linear functions Classical: Ω( n ) classical examples needed Quantum: O (1) quantum examples suffice (Bernstein-Vazirani’93) Time complexity: Learning DNFs Classical: Best known upper bound is quasi-poly. time (Verbeugt’90) Quantum: Polynomial-time (Bshouty-Jackson’95) 13/ 23

Quantum PAC learning Quantum Data � Quantum example: | E c , D � = � D ( x ) | x , c ( x ) � x ∈{ 0 , 1 } n Quantum examples are at least as powerful as classical examples Quantum is indeed more powerful for learning! (for a fixed distribution) Learning class of linear functions under uniform D : Classical: Ω( n ) classical examples needed Quantum: O (1) quantum examples suffice (Bernstein-Vazirani’93) Learning DNF under uniform D : Classical: Best known upper bound is quasi-poly. time (Verbeugt’90) Quantum Polynomial-time (Bshouty-Jackson’95) But in the PAC model, learner has to succeed for all D ! 14/ 23

Quantum sample complexity Quantum upper bound � � ε + log(1 /δ ) d Classical upper bound O carries over to quantum ε 15/ 23

Quantum sample complexity Quantum upper bound � � ε + log(1 /δ ) d Classical upper bound O carries over to quantum ε Best known quantum lower bounds � √ � ε + d + log(1 /δ ) d Atici & Servedio’04: lower bound Ω ε 15/ 23

Quantum sample complexity Quantum upper bound � � ε + log(1 /δ ) d Classical upper bound O carries over to quantum ε Best known quantum lower bounds � √ � ε + d + log(1 /δ ) d Atici & Servedio’04: lower bound Ω ε Zhang’10: improved first term to d 1 − η for all η > 0 ε 15/ 23

Quantum sample complexity = Classical sample complexity Quantum upper bound � � ε + log(1 /δ ) d Classical upper bound O carries over to quantum ε Best known quantum lower bounds � √ � ε + d + log(1 /δ ) d Atici & Servedio’04: lower bound Ω ε Zhang’10 improved first term to d 1 − η for all η > 0 ε Our result: Tight lower bound � � ε + log(1 /δ ) d We show: Ω quantum examples are necessary ε 16/ 23

Quantum sample complexity = Classical sample complexity Quantum upper bound � � ε + log(1 /δ ) d Classical upper bound O carries over to quantum ε Best known quantum lower bounds � √ � ε + d + log(1 /δ ) d Atici & Servedio’04: lower bound Ω ε Zhang’10 improved first term to d 1 − η for all η > 0 ε Our result: Tight lower bound � � ε + log(1 /δ ) d We show: Ω quantum examples are necessary ε Two proof approaches Information theory: conceptually simple, nearly-tight bounds 16/ 23

Quantum sample complexity = Classical sample complexity Quantum upper bound � � ε + log(1 /δ ) d Classical upper bound O carries over to quantum ε Best known quantum lower bounds � √ � ε + d + log(1 /δ ) d Atici & Servedio’04: lower bound Ω ε Zhang’10 improved first term to d 1 − η for all η > 0 ε Our result: Tight lower bound � � ε + log(1 /δ ) d We show: Ω quantum examples are necessary ε Two proof approaches Information theory: conceptually simple, nearly-tight bounds Optimal measurement: tight bounds, some messy calculations 16/ 23

Proof approach First, we consider the problem of probably exactly learning: 1 quantum learner should identify the concept 17/ 23

Proof approach First, we consider the problem of probably exactly learning: 1 quantum learner should identify the concept Here, quantum learner is given one out of |C| quantum states. 2 Identify the target concept using copies of the quantum state 17/ 23

Proof approach First, we consider the problem of probably exactly learning: 1 quantum learner should identify the concept Here, quantum learner is given one out of |C| quantum states. 2 Identify the target concept using copies of the quantum state Quantum state identification has been well-studied 3 17/ 23

Proof approach First, we consider the problem of probably exactly learning: 1 quantum learner should identify the concept Here, quantum learner is given one out of |C| quantum states. 2 Identify the target concept using copies of the quantum state Quantum state identification has been well-studied 3 We’ll get to probably approximately learning soon! 4 17/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] 18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z 18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, 18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement 18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then 18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm 18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) 18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) How does learning relate to identification? 18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately 18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately Let VC-dim( C ) = d . Suppose { s 0 , . . . , s d } is shattered by C . 18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately Let VC-dim( C ) = d . Suppose { s 0 , . . . , s d } is shattered by C . Fix D ( s 0 ) = 1 − ε , D ( s i ) = ε/ d on { s 1 , . . . , s d } 18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately Let VC-dim( C ) = d . Suppose { s 0 , . . . , s d } is shattered by C . Fix D ( s 0 ) = 1 − ε , D ( s i ) = ε/ d on { s 1 , . . . , s d } Let k = Ω( d ) and E : { 0 , 1 } k → { 0 , 1 } d be an error-correcting code 18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately Let VC-dim( C ) = d . Suppose { s 0 , . . . , s d } is shattered by C . Fix D ( s 0 ) = 1 − ε , D ( s i ) = ε/ d on { s 1 , . . . , s d } Let k = Ω( d ) and E : { 0 , 1 } k → { 0 , 1 } d be an error-correcting code Pick 2 k codeword concepts { c z } z ∈{ 0 , 1 } k ⊆ C : 18/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement Crucial property: if P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt (Barnum-Knill’02) How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately Let VC-dim( C ) = d . Suppose { s 0 , . . . , s d } is shattered by C . Fix D ( s 0 ) = 1 − ε , D ( s i ) = ε/ d on { s 1 , . . . , s d } Let k = Ω( d ) and E : { 0 , 1 } k → { 0 , 1 } d be an error-correcting code Pick 2 k codeword concepts { c z } z ∈{ 0 , 1 } k ⊆ C : c z ( s 0 ) = 0, c z ( s i ) = E ( z ) i ∀ i ∈ b 18/ 23

Pick concepts { c z } ⊆ C : c z ( s 0 ) = 0, c z ( s i ) = E ( z ) i ∀ i Suppose VC ( C ) = d + 1 and { s 0 , . . . , s d } is shattered by C , i.e., |C| × ( d + 1) rectangle of { s 0 , . . . , s d } contains { 0 , 1 } d +1 19/ 23

Pick concepts { c z } ⊆ C : c z ( s 0 ) = 0, c z ( s i ) = E ( z ) i ∀ i Suppose VC ( C ) = d + 1 and { s 0 , . . . , s d } is shattered by C , i.e., |C| × ( d + 1) rectangle of { s 0 , . . . , s d } contains { 0 , 1 } d +1 Concepts Truth table c ∈ C · · · · · · · · · s 0 s 1 s d − 1 s d  c 1 0 0 · · · 0 0 · · · · · ·    c 2 0 0 · · · 1 0 · · · · · ·   c 3 0 0 · · · 1 1 · · · · · · c ( s 0 ) = 0 . . . . . ...  . . . . .   . . . . . · · · · · ·   c 2 d 0 1 · · · 1 1 · · · · · · c 2 d +1 1 0 · · · 0 1 · · · · · · . . . . . ... . . . . . . . . . . · · · · · · 1 1 · · · 1 1 · · · · · · c 2 d +1 . . . . . ... . . . . . . . . . . · · · · · · 19/ 23

Pick concepts { c z } ⊆ C : c z ( s 0 ) = 0, c z ( s i ) = E ( z ) i ∀ i Suppose VC ( C ) = d + 1 and { s 0 , . . . , s d } is shattered by C , i.e., |C| × ( d + 1) rectangle of { s 0 , . . . , s d } contains { 0 , 1 } d +1 Concepts Truth table c ∈ C · · · · · · · · · s 0 s 1 s d − 1 s d  c 1 0 0 · · · 0 0 · · · · · ·    c 2 0 0 · · · 1 0 · · · · · ·   c 3 0 0 · · · 1 1 · · · · · · c ( s 0 ) = 0 . . . . . ...  . . . . .   . . . . . · · · · · ·   c 2 d 0 1 · · · 1 1 · · · · · · c 2 d +1 1 0 · · · 0 1 · · · · · · . . . . . ... . . . . . . . . . . · · · · · · 1 1 · · · 1 1 · · · · · · c 2 d +1 . . . . . ... . . . . . . . . . . · · · · · · Among { c 1 , . . . , c 2 d } , pick 2 k concepts that correspond to codewords of E : { 0 , 1 } k → { 0 , 1 } d on { s 1 , . . . , s d } 19/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement If P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately Let VC-dim( C ) = d . Suppose { s 0 , . . . , s d } is shattered by C . Fix D : D ( s 0 ) = 1 − ε , D ( s i ) = ε/ d on { s 1 , . . . , s d } Let k = Ω( d ) and E : { 0 , 1 } k → { 0 , 1 } d be an error-correcting code Pick 2 k concepts { c z } z ∈{ 0 , 1 } k ⊆ C : c z ( s 0 ) = 0, c z ( s i ) = E ( z ) i ∀ i 20/ 23

Proof sketch: Quantum sample complexity T ≥ VC-dim( C ) /ε State identification: Ensemble E = { ( p z , | ψ z � ) } z ∈ [ m ] Given state | ψ z � ∈ E with prob p z . Goal: identify z Optimal measurement could be quite complicated, but we can always use the Pretty Good Measurement If P opt is the optimal success probability, then P opt ≥ P pgm ≥ P 2 opt How does learning relate to identification? Quantum PAC: Given | ψ c � = | E c , D � ⊗ T , learn c approximately Let VC-dim( C ) = d . Suppose { s 0 , . . . , s d } is shattered by C . Fix D : D ( s 0 ) = 1 − ε , D ( s i ) = ε/ d on { s 1 , . . . , s d } Let k = Ω( d ) and E : { 0 , 1 } k → { 0 , 1 } d be an error-correcting code Pick 2 k concepts { c z } z ∈{ 0 , 1 } k ⊆ C : c z ( s 0 ) = 0, c z ( s i ) = E ( z ) i ∀ i Learning c z approximately (wrt D ) is equivalent to identifying z ! 20/ 23

Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! 21/ 23

Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ 21/ 23

Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε 21/ 23

Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε Analysis of PGM 21/ 23

Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm 21/ 23

Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm ≥ P 2 opt ≥ (1 − δ ) 2 21/ 23

Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm ≥ P 2 opt ≥ (1 − δ ) 2 Recall k = Ω( d ) because we used a good ECC 21/ 23

Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm ≥ P 2 opt ≥ (1 − δ ) 2 Recall k = Ω( d ) because we used a good ECC P pgm ≤ 21/ 23

Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm ≥ P 2 opt ≥ (1 − δ ) 2 Recall k = Ω( d ) because we used a good ECC √ P pgm ≤ · · · 4-page calculation · · · ≤ exp( T 2 ε 2 / d + Td ε − d − T ε ) 21/ 23

Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T w.p. ≥ 1 − δ Goal: Show T ≥ d /ε Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm ≥ P 2 opt ≥ (1 − δ ) 2 Recall k = Ω( d ) because we used a good ECC √ P pgm ≤ · · · 4-page calculation · · · ≤ exp( T 2 ε 2 / d + Td ε − d − T ε ) This implies T = Ω( d /ε ) 21/ 23

Sample complexity lower bound via PGM Recap Learning c z approximately (wrt D ) is equivalent to identifying z ! If sample complexity is T , then there is a good learner that identifies z from | ψ c z � = | E c z , D � ⊗ T with probability ≥ 1 − δ Analysis of PGM For the ensemble {| ψ c z � : z ∈ { 0 , 1 } k } with uniform probabilities p z = 1 / 2 k , we have P pgm ≥ P 2 opt ≥ (1 − δ ) 2 √ P pgm ≤ · · · 4-page calculation · · · ≤ exp( T 2 ε 2 / d + Td ε − d − T ε ) This implies T = Ω( d /ε ) 22/ 23

Conclusion and future work Further results 23/ 23

Conclusion and future work Further results Agnostic learning: No quantum bounds known before (unlike PAC model). 23/ 23

Conclusion and future work Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity 23/ 23

Conclusion and future work Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity Also studied the model with random classification noise and show that quantum examples are no better than classical examples 23/ 23

Conclusion and future work Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity Also studied the model with random classification noise and show that quantum examples are no better than classical examples Future work 23/ 23

Conclusion and future work Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity Also studied the model with random classification noise and show that quantum examples are no better than classical examples Future work Quantum machine learning is still young! 23/ 23

Conclusion and future work Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity Also studied the model with random classification noise and show that quantum examples are no better than classical examples Future work Quantum machine learning is still young! Theoretically, one could consider more optimistic PAC-like models where learner need not succeed ∀ c ∈ C and ∀ D 23/ 23

Conclusion and future work Further results Agnostic learning: No quantum bounds known before (unlike PAC model). Showed quantum examples do not reduce sample complexity Also studied the model with random classification noise and show that quantum examples are no better than classical examples Future work Quantum machine learning is still young! Theoretically, one could consider more optimistic PAC-like models where learner need not succeed ∀ c ∈ C and ∀ D Efficient quantum PAC learnability of AC 0 under uniform D ? 23/ 23

Optimal Quantum Sample Complexity of Learning Algorithms - PowerPoint PPT Presentation

Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald de Wolf) 1/ 23 Machine learning Classical machine learning 2/ 23 Machine learning Classical machine learning Grand goal: enable AI

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

The Multiplicative Quantum Adversary Robert palek Quantum query complexity Quantum query

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

Quantum Computation (Lecture QC-3: Quantum Algorithms) Lu s Soares Barbosa MFES -

Quantum Weirdness Part 6 Quantum Weirdness in Materials Quantum Cryptography Quantum

Quantum Cryptography 1. Fake Quantum Theory. 2. Simple Quantum Protocols. 3. More Fake Quantum

Quantum Information Processing and Quantum Error Correction and Quantum Error Correction with

Agglomeration of Ash Particles due to Flue Gas Conditioning (a) Sample CA8S12F1 (b) Sample

Quantum Hamiltonian Complexity Itai Arad Centre of Quantum Technologies National University of

Optimal Algorithms for Learning Bayesian Optimal Algorithms for Learning Bayesian Network

Computation Quantum Computing: . . . Potential Use of . . . in Quantum Space-Time Quantum

Quantum Hall effect effect Quantum Hall integer integer Hall bar geometry classical quantum

Quantum Cryptography Lecture 28 Quantum Cryptography Quantum Cryptography Quantum information:

How Quantum Cryptography Quantum . . . and Quantum Computing How Quantum . . . How to Deal with

Universal complexity in action UNREDUCED COMPLEXITY APPLICATIONS UNREDUCED COMPLEXITY

Formalism Design COMP34512 Sebastian Brandt brandt@cs.manchester.ac.uk (slides by Bijan Parsia

Unit-11: Binary Decision Diagrams (BDDs) B. Srivathsan Chennai Mathematical Institute

Artificial Intelligence Logical Agents and Propositional Logic CS 444 Spring 2019 Dr. Kevin

1.1 Propositional Logic Carola Wenk 8/25/14 CMPS/MATH 2170 Discrete Mathematics -- Carola Wenk

Revisiting Scalar Collapse in AdS New Frontiers in Dynamical Gravity DAMTP, Cambridge Steve

Copper Treatment for Single C T t t f Si l Cell SW structure Cell SW structure 2 nd

More NP-Complete Problems Lecture 4 September 5, 2013 Sariel (UIUC) CS573 1 Fall 2013 1 / 48

CS256/Winter 2009 Lecture #11 Zohar Manna Beyond Temporal Logics Temporal logic expresses