ran raz
play

Ran Raz Princeton University Based on joint works with: Sumegha - PowerPoint PPT Presentation

Learning Fast Requires Good Memory: Time-Space Tradeoff Lower Bounds for Learning Ran Raz Princeton University Based on joint works with: Sumegha Garg, Gillat Kol, Avishay Tal [R16, KRT17, R17, GRT18] This Talk: A line of recent works


  1. Learning Fast Requires Good Memory: Time-Space Tradeoff Lower Bounds for Learning Ran Raz Princeton University Based on joint works with: Sumegha Garg, Gillat Kol, Avishay Tal [R16, KRT17, R17, GRT18]

  2. This Talk: A line of recent works studies time-space (memory-samples) lower bounds for learning [S14, SVW16, R16, VV16, KRT17, MM17, R17, MM18, BOGY18, GRT18, DS18, AS18, DKS19, SSV19, GRT19, GKR19] Main Message: For some learning problems, access to a relatively large memory is crucial. In other words, in some cases, learning is infeasible, due to memory constraints

  3. Original Motivation: Online Learning Theory: Initiated by: [Shamir 2014], [Steinhardt-Valiant-Wager 2015]: Can one prove unconditional lower bounds on the number of samples needed for learning, under memory constraints? (when each sample is viewed only once - also known as online learning)

  4. Example: Parity Learning: 𝒚 = (𝒚 𝟐 , … , 𝒚 𝒐 ) ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown A learner gets a stream of random linear equations (mod 2) in 𝒚 𝟐 , … , 𝒚 𝒐 , one by one, and tries to learn 𝒚 Formally: The learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (inner product mod 2) The learner needs to solve the equations and find 𝒚 (no noise)

  5. Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown

  6. Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟐 = 𝟐, 𝟐, 𝟏, 𝟐, 𝟐 , 𝒄 𝟐 = 𝟏 𝒚 𝟐 + 𝒚 𝟑 + 𝒚 𝟓 + 𝒚 𝟔 = 𝟏 (mod 2)

  7. Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟑 = 𝟏, 𝟐, 𝟐, 𝟏, 𝟏 , 𝒄 𝟑 = 𝟏 𝒚 𝟑 + 𝒚 𝟒 = 𝟏 (mod 2)

  8. Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟒 = 𝟏, 𝟏, 𝟐, 𝟐, 𝟐 , 𝒄 𝟒 = 𝟏 𝒚 𝟒 + 𝒚 𝟓 + 𝒚 𝟔 = 𝟏 (mod 2)

  9. Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟓 = 𝟏, 𝟐, 𝟐, 𝟐, 𝟏 , 𝒄 𝟓 = 𝟏 𝒚 𝟑 + 𝒚 𝟒 + 𝒚 𝟓 = 𝟏 (mod 2)

  10. Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟔 = 𝟐, 𝟐, 𝟏, 𝟏, 𝟐 , 𝒄 𝟔 = 𝟏 𝒚 𝟐 + 𝒚 𝟑 + 𝒚 𝟔 = 𝟏 (mod 2)

  11. Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟕 = 𝟏, 𝟏, 𝟐, 𝟐, 𝟏 , 𝒄 𝟕 = 𝟐 𝒚 𝟒 + 𝒚 𝟓 = 𝟐 (mod 2)

  12. Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟖 = 𝟏, 𝟐, 𝟏, 𝟐, 𝟐 , 𝒄 𝟖 = 𝟏 𝒚 𝟑 + 𝒚 𝟓 + 𝒚 𝟔 = 𝟏 (mod 2)

  13. Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟗 = 𝟐, 𝟏, 𝟏, 𝟏, 𝟐 , 𝒄 𝟗 = 𝟐 𝒚 𝟐 + 𝒚 𝟔 = 𝟐 (mod 2)

  14. Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟘 = 𝟐, 𝟐, 𝟐, 𝟐, 𝟏 , 𝒄 𝟘 = 𝟏 𝒚 𝟐 + 𝒚 𝟑 + 𝒚 𝟒 + 𝒚 𝟓 = 𝟏 (mod 2)

  15. Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟐𝟏 = 𝟏, 𝟐, 𝟐, 𝟐, 𝟐 , 𝒄 𝟐𝟏 = 𝟐 𝒚 𝟑 + 𝒚 𝟒 + 𝒚 𝟓 + 𝒚 𝟔 = 𝟐 (mod 2)

  16. Ready to Play? 𝒚 = (𝒚 𝟐 , 𝒚 𝟑 , 𝒚 𝟒 , 𝒚 𝟓 , 𝒚 𝟔 ) is unknown 𝒃 𝟐𝟐 = 𝟏, 𝟏, 𝟏, 𝟏, 𝟏 , 𝒄 𝟐𝟐 = 𝟏 𝟏 = 𝟏 (mod 2)

  17. Parity Learning: 𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚

  18. 𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚 By solving linear equations: 𝑷(𝒐) samples, 𝑷(𝒐 𝟑 ) memory bits

  19. 𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚 By solving linear equations: 𝑷(𝒐) samples, 𝑷(𝒐 𝟑 ) memory bits By trying all possibilities: 𝑷(𝒐) memory bits, exponential number of samples

  20. 𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚

  21. 𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚 𝒐 𝟑 [R 2016]: Any algorithm for parity learning requires either 𝟐𝟏 memory bits or an exponential number of samples (Conjectured by Steinhardt, Valiant and Wager [2015])

  22. 𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚 𝒐 𝟑 [R 2016]: Any algorithm for parity learning requires either 𝟐𝟏 memory bits or an exponential number of samples (Conjectured by Steinhardt, Valiant and Wager [2015]) Previously: no lower bound on the number of samples, even if the memory size is 𝒐 (for any learning problem) (for memory of size < 𝑜 , relatively easy to prove lower bounds, since inner product is a good two-source extractor)

  23. 𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚 𝒐 𝟑 [R 2016]: Any algorithm for parity learning requires either 𝟐𝟏 memory bits or an exponential number of samples (Conjectured by Steinhardt, Valiant and Wager [2015]) Previously: no lower bound on the number of samples, even if the memory size is 𝒐 (for any learning problem) I will focus on super-linear lower bounds on the memory size

  24. 𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚 𝒐 𝟑 [R 2016]: Any algorithm for parity learning requires either 𝟐𝟏 memory bits or an exponential number of samples (Conjectured by Steinhardt, Valiant and Wager [2015])

  25. 𝒚 ∈ 𝑺 𝟏, 𝟐 𝒐 is unknown Parity Learning: A learner gets a stream of samples: 𝒃 𝟐 , 𝒄 𝟐 , 𝒃 𝟑 , 𝒄 𝟑 … , where ∀𝒖 : 𝒃 𝐮 ∈ 𝑺 𝟏, 𝟐 𝒐 and 𝒄 𝒖 = 𝒃 𝒖 ⋅ 𝒚 (mod 2) and needs to solve the equations and find 𝒚 𝒐 𝟑 [R 2016]: Any algorithm for parity learning requires either 𝟐𝟏 memory bits or an exponential number of samples (Conjectured by Steinhardt, Valiant and Wager [2015]) 𝒐 𝟑 Best upper bound on the memory size : ≈ 𝟓 (when the number of samples is sub-exponential)

  26. Motivation: Machine Learning Theory: For some online learning problems, access to a relatively large memory is crucial In some cases, learning is infeasible, due to memory constraints (if each sample is viewed only once)

  27. Motivation: Machine Learning Theory: For some online learning problems, access to a relatively large memory is crucial In some cases, learning is infeasible, due to memory constraints (if each sample is viewed only once) Very interesting to understand how much memory is needed for learning Our result gives a concept class that can be efficiently learnt if and only if the learner has a quadratic-size memory

  28. Motivation: Machine Learning Theory: For some online learning problems, access to a relatively large memory is crucial In some cases, learning is infeasible, due to memory constraints (if each sample is viewed only once) Very interesting to understand how much memory is needed for learning Our result gives a concept class that can be efficiently learnt if and only if the learner has a quadratic-size memory “ Good ” memory may be crucial in learning processes

  29. Example: Neural Networks Many learning algorithms try to learn a concept by modeling it as a neural network. The algorithm keeps in the memory some neural network and updates the weights when new samples arrive. The memory used is the size of the network

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend