nonparametric inference of interaction laws in particle
play

Nonparametric inference of interaction laws in particle/agent - PowerPoint PPT Presentation

Nonparametric inference of interaction laws in particle/agent systems Fei Lu Department of Mathematics, Johns Hopkins University Joint with: Mauro Maggioni, Sui Tang and Ming Zhong February 13, 2019 CSCAMM Seminar, UMD Motivation Q: What is


  1. Nonparametric inference of interaction laws in particle/agent systems Fei Lu Department of Mathematics, Johns Hopkins University Joint with: Mauro Maggioni, Sui Tang and Ming Zhong February 13, 2019 CSCAMM Seminar, UMD

  2. Motivation Q: What is the law of interaction between particles/agents? 2 / 34

  3. Motivation Q: What is the law of interaction between particles/agents? N � x i ( t ) + 1 m ¨ x i ( t ) = − ν ˙ K ( x i , x j ) , N j = 1 , j � = i Newton’s law of gravitation: K ( x , y ) = Gm 1 m 2 , r = | x − y | r 2 Molecular fluid: K ( x , y ) = ∇ x [Φ( | x − y | )] Lennard-Jones potential: Φ( r ) = c 1 r 12 − c 2 r 6 . flocking birds/school of fish K ( x , y ) = φ ( | x − y | ) ψ ( � x , y � ) opinion/voter models, bacteria models ... a a (1) Cucker+Smale: On the mathematics of emergence. 2007. (2) Vic- sek+Zafeiris: Collective motion. 2012. (3) Mostch+Tadmor: Heterophilious Dy- namics Enhances Consensus. 2014 ... 3 / 34

  4. An inference problem: Infer the rule of interaction in the system N � x i ( t )+ 1 i = 1 , · · · , N , x i ( t ) ∈ R d m ¨ x i ( t ) = − ν ˙ K ( x i − x j ) , N j = 1 , j � = i from observations of trajectories. x i is the position of the i-th particle/agent Data: many independent trajectories { x j ( t ) : t ∈ T } M j = 1 Goal : infer φ in K ( x ) = −∇ Φ( | x | ) = − φ ( | x | ) x m = 0 ⇒ a first order system 4 / 34

  5. N � x i ( t ) = 1 ˙ φ true ( | x i − x j | )( x j − x i ) := [ f φ ( x ( t ))] i N j = 1 , j � = i Least squares regression: with H n = span { e i } n i = 1 , M � x m − f φ ( x m ) � 2 ˆ � ˙ φ n = arg min E M ( φ ) := φ ∈H n m = 1 How to choose the hypothesis space H n ? Inverse problem well-posed/ identifiability? Consistency and rate of “convergence”? 5 / 34

  6. Outline Learning via nonparametric regression: 1 ◮ A regression measure and function space ◮ Identifiability: a coercivity condition ◮ Consistency and rate of convergence Numerical examples 2 ◮ A general algorithm ◮ Lennard-Jones model ◮ Opinion dynamics and multiple-agent systems Open problems 3 6 / 34

  7. Learning via nonparametric regression The dynamical system: N � x i ( t ) = 1 ˙ φ ( | x i − x j | )( x j − x i ) := [ f φ ( x ( t ))] i N j = 1 , j � = i Admissible set ( ≈ globally Lipschitz): K R , S := { ϕ ∈ W 1 , ∞ : supp ϕ ∈ [ 0 , R ] , sup [ | ϕ ( r ) | + | ϕ ′ ( r ) | ] ≤ S } r ∈ [ 0 , R ] Data: M -trajectories { x m ( t ) : t ∈ T } M m = 1 i . i . d x m ( 0 ) ∼ µ 0 ∈ P ( R dN ) T = [ 0 , T ] or { t 1 , · · · , t L } with ˙ x ( t i ) Goal: nonparametric inference 1 of φ 1 (1) Bongini, Fornasier, Hansen, Maggioni: Inferring Interaction Rules for mean field equations, M3AS, 2017. (2) Binev, Cohen, Dahmen, Devore and Temlyakov: Universal Algorithms for learning theory, JMLR 2005. (3) Cucker, Smale: On the mathematical foundation of learning. Bulletin of AMS, 2001. 7 / 34

  8. L , M � 1 ˙ ˆ � f φ ( X m ( t l )) − X m ( t l ) � 2 φ M , H = arg min E M ( φ ) := ML φ ∈H l , m = 1 E M ( φ ) is quadratic in φ , and E M ( φ ) ≥ E M ( φ true ) = 0 The minimizer exists for any H = H n = span { φ 1 , . . . , φ n } M →∞ Agenda E M ( · ) E ∞ ( · ) a function space with metric dist ( � φ, φ true ) ; ? M →∞ � � φ M , H φ ∞ , H Learnability: ? ? dist ( H ,φ true ) → 0 ◮ Convergence of estimators? ◮ Convergence rate? φ true 8 / 34

  9. Review of classical nonparametric regression: Estimate φ ( z ) = E [ Y | Z = z ] : R D → R from data { z i , y i } M m = 1 . { z i , y j } are iid samples; E M ( f ) := � M ˆ m = 1 � y i − f ( z i ) � 2 φ n := arg min f ∈H n Optimal rate: if dist ( H n , φ true ) � n − s and n ∗ = ( M / log M ) 2 s + 1 , 1 s � ˆ φ n ∗ − φ � L 2 ( ρ Z ) � M − 2 s + D 9 / 34

  10. Review of classical nonparametric regression: Estimate φ ( z ) = E [ Y | Z = z ] : R D → R from data { z i , y i } M m = 1 . { z i , y j } are iid samples; E M ( f ) := � M ˆ m = 1 � y i − f ( z i ) � 2 φ n := arg min f ∈H n Optimal rate: if dist ( H n , φ true ) � n − s and n ∗ = ( M / log M ) 2 s + 1 , 1 s � ˆ φ n ∗ − φ � L 2 ( ρ Z ) � M − 2 s + D Our learning of kernel φ : R + → R from data { x m ( t ) } N � x i ( t ) = 1 ˙ φ ( | x i − x j | )( x j − x i ) N j = 1 , j � = i { r m ijt := | x m i ( t ) − x m j ( t ) |} not iid The values of φ ( r m ijt ) unknown 10 / 34

  11. Regression measure Distribution of pairwise-distances ρ : R + → R L , N � 1 ρ T ( r ) = E µ 0 δ r ii ′ ( t l ) ( r ) � N � L 2 l , i , i ′ = 1 , i < i ′ M →∞ unknown, estimated by empirical distribution ρ M − − − − → ρ T (LLN) T intrinsic to the dynamics Regression function space L 2 ( ρ T ) the admissible set ⊂ L 2 ( ρ T ) H = piecewise polynomials ⊂ L 2 ( ρ T ) singular kernels ⊂ L 2 ( ρ T ) 11 / 34

  12. M →∞ E M ( · ) E ∞ ( · ) Identifiability: a coercivity condition ? M →∞ � � ˆ φ M , H φ ∞ , H φ M , H = arg min E M ( φ ) ? φ ∈H ? φ true � T 1 E ∞ (ˆ φ − φ ( X ( t )) � 2 dt ≥ c � (ˆ φ − φ )( · ) · � 2 φ ) − E ∞ ( φ ) = E µ 0 � f ˆ L 2 ( ρ T ) NT 0 Coercivity condition. There exists c T > 0 s.t. for all ϕ ( · ) · ∈ L 2 ( ρ T ) � T 1 E µ 0 � f ϕ ( x ( t )) � 2 dt = �� ϕ, ϕ �� ≥ c T � ϕ ( · ) · � 2 L 2 ( ρ T ) NT 0 � T 1 coercivity: bilinear functional �� ϕ, ψ �� := 0 E µ 0 � f ϕ , f ψ � ( x ( t )) dt NT controls condition number of regression matrix 12 / 34

  13. Consistency of estimator Theorem (L. Maggioni, Tang, Zhong) Assume the coercivity condition. Let {H n } be a sequence of compact convex subsets of L ∞ ([ 0 , R ]) such that inf ϕ ∈H n � ϕ − φ true � ∞ → 0 as n → ∞ . Then M →∞ � � n →∞ lim lim φ M , H n ( · ) · − φ true ( · ) · � L 2 ( ρ T ) = 0 , almost surely . For each n , compactness of { � φ M , H n } and coercivity implies that φ M , H n → � � φ ∞ , H n in L 2 Increasing H n and coercivity implies consistency. In general, truncation to make H n compact 13 / 34

  14. Optimal rate of convergence Theorem (L. Maggioni, Tang, Zhong) Let {H n } be a seq. of compact convex subspaces of L ∞ [ 0 , R ] s.t. ϕ ∈H n � ϕ − φ true � ∞ ≤ c 1 n − s . dim ( H n ) ≤ c 0 n , and inf 1 2 s + 1 : then Assume the coercivity condition. Choose n ∗ = ( M / log M ) � log M � s 2 s + 1 E µ 0 [ � � φ T , M , H n ∗ ( · ) · − φ true ( · ) · � L 2 ( ρ T ) ] ≤ C . M The 2nd condition is about regularity: φ ∈ C s Choose H n according to s and M 14 / 34

  15. Prediction of future evolution Theorem (L. Maggioni, Tang, Zhong) Denote by � X ( t ) and X ( t ) the solutions of the systems with kernels � φ and φ respectively, starting from the same initial conditions that are drawn i.i.d from µ 0 . Then we have √ � � N � � X ( t ) − X ( t ) � 2 ] � φ ( · ) · − φ ( · ) · � 2 E µ 0 [ sup L 2 ( ρ T ) , t ∈ [ 0 , T ] Follows from Grownwall’s inequality 15 / 34

  16. Outline Learning via nonparametric regression: 1 ◮ A regression measure and function space ◮ Learnability: a coercivity condition ◮ Consistency and rate of convergence Numerical examples 2 ◮ A general algorithm ◮ Lennard-Jones model ◮ Opinion dynamics and multiple-agent systems Open problems 3 16 / 34

  17. Numerical examples The regression algorithm � � 2 L , M , N � � N � � 1 1 � � x ( m ) N ϕ ( r m i , i ′ ( t l )) r m � ˙ E M ( ϕ ) = ( t l ) − i , i ′ ( t l ) , � � i LMN � l , m , i = 1 i ′ = 1 n � a p ψ p ( r ) : a = ( a 1 , . . . , a n ) ∈ R n } , H n := { ϕ = p = 1 M � E L , M ( ϕ ) = E L , M ( a ) = 1 � d m − Ψ m L a � 2 R LNd . M m = 1 M M � � 1 L a = 1 A m b m L , rewrite as A M a = b M M M m = 1 m = 1 can be computed parallelly Caution: choice of { ψ p } affects condi( A M ) 17 / 34

  18. Assume coercivity condition: �� ϕ, ϕ �� ≥ c T � ϕ ( · ) · � 2 L 2 ( ρ T ) . Proposition (Lower bound on smallest singular value of A M ) Let { ψ 1 , · · · , ψ n } be a basis of H n s.t. � ψ p ( · ) · , ψ p ′ ( · ) ·� L 2 ( ρ L T ) = δ p , p ′ , � ψ p � ∞ ≤ S 0 . � � p , p ′ ∈ R n × n . Then σ min ( A ∞ ) ≥ c L . Let A ∞ = �� ψ p , ψ p ′ �� Moreover, A ∞ is the a.s. limit of A M . Therefore, for large M, the smallest singular value of A M satisfies with a high probability that σ min ( A M ) ≥ ( 1 − ǫ ) c L Choose { ψ p ( · ) ·} linearly independent in L 2 ( ρ T ) Piecewise polynomials: on a partition of support( ρ T ) Finite difference ≈ derivatives ⇒ an O (∆ t ) error to estimator 18 / 34

  19. Implementation Approximate regression measure 1 ◮ Estimate the ρ T with large datasets ◮ Partition on support( ρ T ) Construct hypothesis space H : 2 ◮ degree of piecewise polynomials ◮ set dimension of H according to sample size Regression: 3 ◮ Assemble the arrays (in parallel) ◮ Solve the normal equation 19 / 34

  20. Examples: Lennard-Jones Dynamics The Lennard-Jones potential �� σ � 6 � � 12 � σ V LJ ( r ) = 4 ǫ − ⇒ φ ( r ) r = V LJ ′ ( r ) r r N � x i ( t ) = 1 ˙ φ ( | x i − x j | )( x j − x i ) N j = 1 , j � = i time0.010 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 20 / 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend