nonparametric inference of interaction laws in particle
play

Nonparametric inference of interaction laws in particle/agent - PowerPoint PPT Presentation

Nonparametric inference of interaction laws in particle/agent systems Fei Lu Department of Mathematics, Johns Hopkins University Joint with: Mauro Maggioni Sui Tang Ming Zhong July 11, 2019 Applied Math and Comp Sci Colloquium University


  1. Nonparametric inference of interaction laws in particle/agent systems Fei Lu Department of Mathematics, Johns Hopkins University Joint with: Mauro Maggioni Sui Tang Ming Zhong July 11, 2019 Applied Math and Comp Sci Colloquium University of Pennsylvania FL acknowledges supports from JHU, NSF

  2. Outline Motivation and problem statement 1 Learning via nonparametric regression 2 Numerical examples 3 Ongoing work and open problems 4 2 / 36

  3. Motivation Q: What is the law of interaction between particles/agents? 3 / 36

  4. Motivation Q: What is the law of interaction between particles/agents? N � x i ( t ) + 1 m ¨ x i ( t ) = − ν ˙ K ( x i , x j ) , N j = 1 , j � = i Newton’s law of gravitation: K ( x , y ) = Gm 1 m 2 , r = | x − y | r 2 Molecular fluid: K ( x , y ) = ∇ x [Φ( | x − y | )] Lennard-Jones potential: Φ( r ) = c 1 r 12 − c 2 r 6 . flocking birds/school of fish K ( x , y ) = φ ( | x − y | ) x − y | x − y | opinion/voter models, bacteria/cells ... a a (1) Cucker+Smale: On the mathematics of emergence. 2007. (2) Vic- sek+Zafeiris: Collective motion. 2012. (3) Mostch+Tadmor: Heterophilious Dy- namics Enhances Consensus. 2014 ... 4 / 36

  5. An inference problem: Infer the rule of interaction in the system N � x i ( t )+ 1 i = 1 , · · · , N , x i ( t ) ∈ R d m ¨ x i ( t ) = − ν ˙ K ( x i − x j ) , N j = 1 , j � = i from observations of trajectories. x i is the position of the i-th particle/agent Data : many independent trajectories { x j ( t ) : t ∈ T } M j = 1 Goal : infer φ : R + → R in K ( x ) = −∇ Φ( | x | ) = − φ ( | x | ) x | x | For simplicity, we consider only first-order systems ( m = 0) ↓ 5 / 36

  6. N � φ true ( | x i − x j | ) x j − x i x i ( t ) = 1 ˙ ˙ → x = f φ true ( x ( t )) N | x j − x i | j = 1 , j � = i Least squares regression: with H n = span { e i } n i = 1 , M � x m − f φ ( x m ) �| 2 ˆ E M ( φ ) := |� ˙ φ n = arg min φ ∈H n m = 1 Choice of H n & function space of learning? Inverse problem well-posed/ identifiability? Consistency and rate of “convergence”? → hypothesis testing and model selection 6 / 36

  7. Outline Motivation and problem statement 1 Learning via nonparametric regression: 2 ◮ Function space of regression ◮ Identifiability: a coercivity condition ◮ Consistency and rate of convergence Numerical examples 3 Ongoing work and open problems 4 7 / 36

  8. Learning via nonparametric regression The dynamical system: ˙ x = f φ true ( x ( t )) Data: M -trajectories { x m ( t ) : t ∈ T } M m = 1 x m ( 0 ) i . i . d ∼ µ 0 ∈ P ( R dN ) T = [ 0 , T ] or { t 1 , · · · , t L } with ˙ x ( t i ) Goal: nonparametric inference 1 of φ true 1 (1) Bongini, Fornasier, Hansen, Maggioni: Inferring Interaction Rules for mean field equations, M3AS, 2017. (2) Binev, Cohen, Dahmen, Devore and Temlyakov: Universal Algorithms for learning theory, JMLR 2005. (3) Cucker, Smale: On the mathematical foundation of learning. Bulletin of AMS, 2001. 8 / 36

  9. L , M � 1 ˙ ˆ � f φ ( X m ( t l )) − X m ( t l ) � 2 φ M , H = arg min E M ( φ ) := ML φ ∈H l , m = 1 E M ( φ ) is quadratic in φ , and E M ( φ ) ≥ E M ( φ true ) = 0 The minimizer exists for any H = H n = span { e 1 , . . . , e n } Tasks M →∞ E M ( · ) E ∞ ( · ) Choice of H n & function space of learning? � ? M →∞ � φ M , H φ ∞ , H Inverse problem well-posed/ identifiability? ?? ? dist ( H ,φ true ) → 0 Consistency and rate of φ true “convergence”? 9 / 36

  10. Review of classical nonparametric regression: Estimate y = φ ( z ) : R D → R from data { z i , y i } M m = 1 . { z i , y j } are iid samples; E M ( f ) := � M ˆ m = 1 � y i − f ( z i ) � 2 → E [ Y | Z = z ] φ n := arg min f ∈H n Optimal rate: if dist ( H n , φ true ) � n − s and n ∗ = ( M / log M ) 1 2 s + 1 , s � ˆ φ n ∗ − φ � L 2 ( ρ Z ) � M − 2 s + D 2 2 (1) F.Cucker and S.Smale. On the mathematical foundations of learning. Bulletin of the AMS, 2002 (2) L.Györfi, M.Kohler, A.Krzyzak, H.Walk, A Distribution-Free Theoryof Nonparametric Regression (Springer 2002). 10 / 36

  11. Review of classical nonparametric regression: Estimate y = φ ( z ) : R D → R from data { z i , y i } M m = 1 . { z i , y j } are iid samples; E M ( f ) := � M ˆ m = 1 � y i − f ( z i ) � 2 φ n := arg min f ∈H n Optimal rate: if dist ( H n , φ true ) � n − s and n ∗ = ( M / log M ) 1 2 s + 1 , s � ˆ φ n ∗ − φ � L 2 ( ρ Z ) � M − 2 s + D Our case: learning of kernel φ : R + → R from data { x m ( t ) } N � φ ( | x i − x j | ) x j − x i x i ( t ) = 1 ˙ N | x j − x i | j = 1 , j � = i { r m ij ( t ) := | x m i ( t ) − x m j ( t ) |} not iid The values of φ ( r m ij ( t )) unknown 11 / 36

  12. Regression measure Distribution of pairwise-distances ρ : R + → R L , N � 1 ρ T ( r ) = E µ 0 δ r ii ′ ( t l ) ( r ) � N � L 2 l , i , i ′ = 1 , i < i ′ M →∞ unknown, estimated by empirical distribution ρ M − − − − → ρ T (LLN) T intrinsic to the dynamics Regression function space L 2 ( ρ T ) the admissible set ⊂ L 2 ( ρ T ) H = piecewise polynomials ⊂ L 2 ( ρ T ) singular kernels ⊂ L 2 ( ρ T ) 12 / 36

  13. M →∞ E M ( · ) E ∞ ( · ) Identifiability: a coercivity condition ? M →∞ � � ˆ φ M , H φ ∞ , H φ M , H = arg min E M ( φ ) ? φ ∈H ? φ true � T 1 E ∞ (ˆ φ − φ true ( X ( t )) � 2 dt ≥ c � ˆ φ − φ true � 2 φ ) −E ∞ ( φ true ) = E µ 0 � f ˆ L 2 ( ρ T ) NT 0 Coercivity condition. ∃ c T , H > 0 s.t. for all ϕ ∈ H ⊂ L 2 ( ρ T ) � T 1 E µ 0 � f ϕ ( x ( t )) � 2 dt = �� ϕ, ϕ �� ≥ c T , H � ϕ � 2 L 2 ( ρ T ) NT 0 � T 1 coercivity: bilinear functional �� ϕ, ψ �� := 0 E µ 0 � f ϕ , f ψ � ( x ( t )) dt NT controls condition number of regression matrix 13 / 36

  14. Consistency of estimator Theorem (L., Maggioni, Tang, Zhong) Assume the coercivity condition. Let {H n } be a sequence of compact convex subsets of L ∞ ([ 0 , R ]) such that inf ϕ ∈H n � ϕ − φ true � ∞ → 0 as n → ∞ . Then M →∞ � � n →∞ lim lim φ M , H n − φ true � L 2 ( ρ T ) = 0 , almost surely . For each n , compactness of { � φ M , H n } and coercivity implies that φ M , H n → � � φ ∞ , H n in L 2 Increasing H n and coercivity implies consistency. In general, truncation to make H n compact 14 / 36

  15. Optimal rate of convergence Theorem (L. Maggioni, Tang, Zhong) Let {H n } be a seq. of compact convex subspaces of L ∞ [ 0 , R ] s.t. ϕ ∈H n � ϕ − φ true � ∞ ≤ c 1 n − s . dim ( H n ) ≤ c 0 n , and inf 1 2 s + 1 : then Assume the coercivity condition. Choose n ∗ = ( M / log M ) � log M � s 2 s + 1 E µ 0 [ � � φ T , M , H n ∗ − φ true � L 2 ( ρ T ) ] ≤ C . M The 2nd condition is about regularity: φ ∈ C s Choice of dim ( H n ) : adaptive to s and M 15 / 36

  16. Prediction of future evolution Theorem (L., Maggioni, Tang, Zhong) Denote by � X ( t ) and X ( t ) the solutions of the systems with kernels � φ and φ respectively, starting from the same initial conditions that are drawn i.i.d from µ 0 . Then we have √ � � N � � X ( t ) − X ( t ) � 2 ] � φ − φ true � 2 E µ 0 [ sup L 2 ( ρ T ) , t ∈ [ 0 , T ] Follows from Grownwall’s inequality 16 / 36

  17. Outline Motivation and problem statement 1 Learning via nonparametric regression: 2 ◮ A regression measure and function space ◮ Learnability: a coercivity condition ◮ Consistency and rate of convergence Numerical examples 3 ◮ A general algorithm ◮ Lennard-Jones model ◮ Opinion dynamics and multiple-agent systems Ongoing work and open problems 4 17 / 36

  18. Numerical examples The regression algorithm � � 2 L , M , N � � N � � 1 1 � � x ( m ) N ϕ ( r m i , i ′ ( t l )) r m � ˙ E M ( ϕ ) = ( t l ) − i , i ′ ( t l ) , � � i LMN � l , m , i = 1 i ′ = 1 n � a p ψ p ( r ) : a = ( a 1 , . . . , a n ) ∈ R n } , H n := { ϕ = p = 1 M � E L , M ( ϕ ) = E L , M ( a ) = 1 � d m − Ψ m L a � 2 R LNd . M m = 1 M M � � 1 L a = 1 A m b m L , rewrite as A M a = b M M M m = 1 m = 1 can be computed parallelly Caution: choice of { ψ p } affects condi( A M ) 18 / 36

  19. Assume the coercivity condition: �� ϕ, ϕ �� ≥ c T , H � ϕ � 2 L 2 ( ρ T ) . Proposition (Lower bound on smallest singular value of A M ) Let { ψ 1 , · · · , ψ n } be a basis of H n s.t. � ψ p , ψ p ′ � L 2 ( ρ L T ) = δ p , p ′ , � ψ p � ∞ ≤ S 0 . � � p , p ′ ∈ R n × n . Then σ min ( A ∞ ) ≥ c T , H . Let A ∞ = �� ψ p , ψ p ′ �� Moreover, A ∞ is the a.s. limit of A M . Therefore, for large M, the smallest singular value of A M satisfies with a high probability that σ min ( A M ) ≥ ( 1 − ǫ ) c T , H Choose { ψ p } linearly independent in L 2 ( ρ T ) Piecewise polynomials: on a partition of support( ρ T ) Finite difference ≈ derivatives ⇒ an O (∆ t ) error to estimator 19 / 36

  20. Implementation Approximate regression measure 1 ◮ Estimate the ρ T with large datasets ◮ Partition on support( ρ T ) Construct hypothesis space H : 2 ◮ choose the degree of piecewise polynomials ◮ set dimension of H according to sample size Regression: 3 ◮ Assemble the arrays (in parallel) ◮ Solve the normal equation 20 / 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend