an alphabet size bound for the information bottleneck
play

An Alphabet-Size Bound for the Information Bottleneck Function ISIT - PowerPoint PPT Presentation

An Alphabet-Size Bound for the Information Bottleneck Function ISIT 2020 Christoph Hirche , Andreas Winter What for? DNNs video processing clustering C. Hirche IBM bounds 2/16 Sufficient Statistics Sufficient statistics are maps or


  1. An Alphabet-Size Bound for the Information Bottleneck Function ISIT 2020 Christoph Hirche , Andreas Winter

  2. What for? DNNs video processing clustering C. Hirche – IBM bounds 2/16

  3. Sufficient Statistics Sufficient statistics are maps or partitions of X , S ( X ) , that capture all the information that X has on Y . Namely, I ( S ( X ); Y ) = I ( X ; Y ) . C. Hirche – IBM bounds 3/16

  4. Sufficient Statistics Sufficient statistics are maps or partitions of X , S ( X ) , that capture all the information that X has on Y . Namely, I ( S ( X ); Y ) = I ( X ; Y ) . Minimal sufficient statistics, T(X), are the simplest sufficient statistics. T ( X ) = arg min I ( S ( X ); X ) . S ( X ): I ( S ( X ); Y )= I ( X ; Y ) C. Hirche – IBM bounds 3/16

  5. Sufficient Statistics Sufficient statistics are maps or partitions of X , S ( X ) , that capture all the information that X has on Y . Namely, I ( S ( X ); Y ) = I ( X ; Y ) . Minimal sufficient statistics, T(X), are the simplest sufficient statistics. T ( X ) = arg min I ( S ( X ); X ) . S ( X ): I ( S ( X ); Y )= I ( X ; Y ) Approximate minimal sufficient statistics ⇔ Information Bottleneck S ( X ): I ( S ( X ); Y ) ≥ a I ( S ( X ); X ) min C. Hirche – IBM bounds 3/16

  6. Application in ML C. Hirche – IBM bounds 4/16

  7. IB optimality? From Schwartz-Ziv, Tishby : The DNN layers converge to fixed-points of the IB equations. C. Hirche – IBM bounds 5/16

  8. Dimension Bounds Generally known: | W | ≤ | X | + 1 . C. Hirche – IBM bounds 6/16

  9. Dimension Bounds Generally known: | W | ≤ | X | + 1 . But can we get bounds in terms of | Y | ? C. Hirche – IBM bounds 6/16

  10. Dimension Bounds Generally known: | W | ≤ | X | + 1 . But can we get bounds in terms of | Y | ? Maybe approximate? I XY ( R , N ) ≤ I XY ( R ) ≤ I XY ( R , N ) + δ ( ǫ, | Y | ) for some δ ( ǫ, | Y | ) and | W | ≤ N ( ǫ, | Y | ) . C. Hirche – IBM bounds 6/16

  11. Recoverability Lemma Given a joint distribution P XY of two random variables X and Y, and assuming that there exist N probability distributions Q 1 , . . . , Q N on Y , and a function f : X − → [ N ] with the property that 1 ∀ x 2 � P Y | X = x − Q f ( x ) � 1 ≤ ǫ, for some ǫ > 0 . Then there exists a recovery channel S : [ N ] − → X such that the Markov chain Y − X − X ′ − � X defined by X ′ = f ( X ) and P � X | X ′ = S satisfies XY � 1 ≤ ǫ ′ = 2 ǫ . X and 1 P X = P � 2 � P XY − P � C. Hirche – IBM bounds 7/16

  12. Bounds on N ? How large does N need to be? C. Hirche – IBM bounds 8/16

  13. Bounds on N ? How large does N need to be? Easy: N ≤ | X | , but that’s still too big. C. Hirche – IBM bounds 8/16

  14. Bounds on N ? How large does N need to be? Easy: N ≤ | X | , but that’s still too big. In the worst case, we need to choose an ǫ -net of the probability simplex P ( Y ) of all probability distributions on Y with respect to the total variational distance, which results in � 2 � | Y | N ≤ . ǫ Generally, one can do much better (e.g. for deterministic data sets). C. Hirche – IBM bounds 8/16

  15. IBM Bound Lemma Let Y − X − � X be a Markov chain. Then the IB function of P XY dominates the IB function of P � XY pointwise: I XY ( R ) ≥ I � XY ( R ) ∀ R . C. Hirche – IBM bounds 9/16

  16. Alphabet-Size bounds Corollary Under the assumptions of our main lemma, I X ′ Y ( R ) ≤ I XY ( R ) ≤ I X ′ Y ( R ) + δ ( ǫ, | Y | ) , � � where δ ( ǫ, | Y | ) := ǫ ′ log | Y | + ( 1 + ǫ ′ ) h ǫ ′ . 1 + ǫ ′ Corollary Under the assumptions of our main lemma, I XY ( R , N ) ≤ I XY ( R ) ≤ I XY ( R , N ) + δ ( ǫ, | Y | ) , � � � 2 � | Y | . where δ ( ǫ, | Y | ) := ǫ ′ log | Y | + ( 1 + ǫ ′ ) h ǫ ′ and N ≤ 1 + ǫ ′ ǫ C. Hirche – IBM bounds 10/16

  17. Quantum IB C. Hirche – IBM bounds 11/16

  18. QIB For a quantum state ρ XY , we define R q ( a ) = inf I ( YR ; W ) σ N X → W I ( Y ; W ) σ ≥ a with, σ WYR := ( N X → W ⊗ id YR )Ψ XYR and Ψ XYR a purification of ρ XY . C. Hirche – IBM bounds 12/16

  19. QIB Lemma For X and Y quantum, and W classical, an optimal solution for the quantum information bottleneck can be achieved with | W | ≤ | Y | 2 | R | 2 + 1 . Lemma For Y quantum, but X and W classical, an optimal solution for the quantum information bottleneck can be achieved with | W | ≤ | X | + 1 . C. Hirche – IBM bounds 13/16

  20. QIB Lemma Given a classical-quantum state � p ( x ) | x �� x | ⊗ ρ x ρ XY = Y , (1) x and assume that there exist N quantum states σ 1 Y , . . . , σ N Y and a function f : X − → [ N ] with the property that 1 Y − σ f ( x ) 2 � ρ x ∀ x � 1 ≤ ǫ, (2) Y for given ǫ > 0 . Then there exists a recovery channel S : [ N ] − → X such that the Markov chain Y − X − X ′ − � X defined by X ′ = f ( X ) and P � X | X ′ = S satisfies XY � 1 ≤ ǫ ′ = 2 ǫ . X and 1 P X = P � 2 � ρ XY − ρ � C. Hirche – IBM bounds 14/16

  21. QIB For Y quantum, but X and W classical: Corollary Under the assumptions of the previous lemma, I cq X ′ Y ( R ) ≤ I cq XY ( R ) ≤ I cq X ′ Y ( R ) + δ ( ǫ, | Y | ) , � � where δ ( ǫ, | Y | ) := ǫ ′ log | Y | + ( 1 + ǫ ′ ) h ǫ ′ . 1 + ǫ ′ Corollary Under the assumptions of the previous lemma, I cq XY ( R , N ) ≤ I cq XY ( R ) ≤ I cq XY ( R , N ) + δ ( ǫ, | Y | ) , � 3 � 2 | Y | 2 where δ ( ǫ, | Y | ) is as before and N ≤ . ǫ C. Hirche – IBM bounds 15/16

  22. The End Summary: New approach to alphabet-size bounds via recoverability. New bounds on approximating the IB with alphabet-size limited by | Y | (instead of | X | ). Open Questions: Other applications to recoverability approach. Fully quantum case. (Stay tuned for more on this soon 1 .) Thanks!! 1 M. Christandl, CH, AW, in preparation , 2020 C. Hirche – IBM bounds 16/16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend