Jonathan Weare Courant Institute, New York University with Aaron - PowerPoint PPT Presentation

Characterizing long timescale phenomena with trajectory data Jonathan Weare Courant Institute, New York University with Aaron Dinner, Douglas Dow, Dimitrios Giannakis, John Strahan, Erik Thiede, and Rob Webber April 23, 2020

The most interesting events often occur very infrequently

The most intense (and damaging) tropical cy- clones occur about once a decade, but processes (e.g. gravity waves) on the timescale of seconds to minutes must be resolved in numerical integration of weather models. Major blackouts not (counting those due to major storms) are extremely rare in the US. The last one occurred in 2003. Within minutes millions (even- tually 55 million in the US and Canada) were af- fected. Disassociation of the insulin dimer has important therapeutic implications. It occurs roughly on the microsecond to millescond timescale. That’s up to 12 orders of magnitude longer than the time scale of bond vibrations (10 − 15 s )

Three common approaches to interrogating long timescale processes: 1. Direct simulation : Find (or build) a really fast computer (Anton2 shown here) and integrate for as long as you can. 2. Coarse graining : Build a cheaper but less accurate model that you can run to very long timescales. (image from the Voth group) 3. Rare event simulation : Try to “trick” the model into undergoing the event of interest quickly while maintaining the ability to estimate unbiased statistics.

However you generate the data... It needs to be processed it for understanding of the long timescale phenomena

However you generate the data... It needs to be processed it for understanding of the long timescale phenomena In this talk I: 1. Report on our analysis of a well established dynamic spectral estimation approach that approximates the slowest decorrelating features of the system. 2. Describe a family of methods that we have developed that uses short trajectories to compute statistics describing a specified long timescale event.

The Variational Approach to Conformational Dynamics Rob Webber, Erik Thiede, Douglas Dow, Aaron Dinner VAC estimates eigenvalues and eigenspaces of the transition operator T τ with action T τ f ( x ) = E [ f ( X τ ) | X 0 = x ] VAC assumes that X t has unique ergodic probability measure µ and that T τ : L 2 ( µ ) → L 2 ( µ ) is self-adjoint. Our analysis assumes quasi-compactness: r T τ = � e − σ i τ proj [ η i ] + R τ �R τ � 2 ≤ e − σ r + 1 τ with i = 1 where 1 = e − σ 1 τ > e − σ 2 τ ≥ · · · ≥ e − σ r + 1 τ and where η 1 , η 2 , . . . , η r are eigenfunctions. η 1 ≡ 1.

VAC estimates span i ≤ k { η i } the most slowly decorrelating functions of the system. If η belongs to the linear span of η 2 , . . . , η k , then � η, T τ η � µ ≥ e − σ k τ . corr µ [ η ( X 0 ) , η ( X τ )] = � η, η � µ If u is orthogonal to η 1 , . . . , η k then, � u , T τ u � µ ≤ e − σ k + 1 τ . corr µ [ u ( X 0 ) , u ( X τ )] = � u , u � µ

VAC workflow... 1. Generate samples of X 0 from µ and then X τ given X 0 . 2. Choose a set of basis functions φ 1 , φ 2 , . . . , φ n . 3. Use samples to build estimates ˆ C ij ( 0 ) ≈ C ij ( 0 ) = � φ i , φ j � µ = E [ φ i ( X 0 ) φ j ( X 0 )] ˆ C ij ( τ ) ≈ C ij ( τ ) = � φ i , T τ φ j � µ = E [ φ i ( X 0 ) φ j ( X τ )] − 1 ˆ 4. Solve for the eigenpairs (ˆ v i ) of ˆ i , ˆ C ( 0 ) C ( τ ) . λ τ v i 5. Return approximate eigenfunctions ˆ i = � j ˆ j φ j . γ τ Most common variants: Markov State Models (MSM)s : choose a basis of indicator functions on a partition of space (usually found by clustering the data). Time-lagged Independent Component Analysis (TICA) : choose the coordinate axes as a basis.

Trp cage folding A well studied fast folding mini-protein. This study used a long trajectory with between 12 and 31 folding/unfolding events generated on Anton. We’ll see this ex- ample again later. Sidky et al. 2019

When should we trust VAC? We divide the VAC error into two contributions: 1. Approximation error: If ˆ C = C then VAC approximate eigenfunctions j v i j φ j where ( λ j , v j ) are eigenpairs of C ( 0 ) − 1 C ( τ ) . How big is are γ τ i = � � � dist F span i ≤ k { γ τ i } , span i ≤ k { η i } ? 2. Estimation error: In practice we use sampled data to build the estimate ˆ C of C . How big is � � dist F span i ≤ k { ˆ γ τ i } , span i ≤ k { γ τ i } ? � � W ⊥ � � We use the projection distance dist F ( U , W ) = � proj proj [ U ] F between � subspaces of L 2 ( µ ) .

Approximation error ( ˆ C = C ) Natural to apply existing bounds for Rayleigh-Ritz method. Let Φ = span i ≤ n { φ i } and assume 1 ∈ Φ . Then dist 2 � � span i ≤ k { γ τ i } , span i ≤ k { η i } ≤ 1 + � proj [Φ ⊥ ] T τ proj [Φ] � 2 F 2 1 ≤ | e − σ k τ − λ τ dist 2 � � k + 1 | 2 span i ≤ k { η i } , Φ F provided that e − σ k τ > λ τ k + 1 . As long as σ k < σ k + 1 span i ≤ k { γ τ i } → span i ≤ k { η i } as proj Φ [ η i ] → η i for i ≤ k RR bounds used to prove λ τ i converge in [Djurdjevac, Sarich, and Schütte (2012)] .

But what happens when we increase τ ? 1 d Ornstein-Uhlenbeck process with MSM (indicator) basis. Approximation error in span { η 1 , η 2 , η 3 } . We need a more detailed approximation error bound.

Approximation error dependence on τ Going beyond the Rayleigh-Ritz bounds we prove: Provided that σ i is isolated, λ τ i e − σ i τ → c i as τ → ∞ where c i is independent of τ . As long as σ k < σ k + 1 span i ≤ k { γ τ i } → proj [Φ] span i ≤ k { η i } as τ → ∞ and convergence is exponentially fast. Provided that e − σ k + 1 τ < λ τ k dist 2 2 � � span i ≤ k { γ τ i } , span i ≤ k { η i } � e − σ k + 1 τ � ≤ 1 + 1 F � � . � � k − e − σ k + 1 τ dist 2 � span i ≤ k { η i } , Φ � 4 λ τ � � F

The new bound is sharp for large τ 1 d Ornstein-Uhlenbeck process with MSM (indicator) basis. Approximation error in span { η 1 , η 2 , η 3 } . Approximation error gets better (not worse) as τ increases.

A precise asymptotic formula for estimation error The estimation error can be expressed as � 2 � dist F span i ≤ k ˆ γ τ i , span i ≤ k γ τ i 2 v i ( τ ) T � � v j ( τ ) � ˆ j ˆ � n k C ( τ ) − λ τ C ( 0 ) � � � � � � = ( 1 + o ( 1 )) � � λ τ i − λ τ � � j i = k + 1 j = 1 � � in the limit as ˆ C ( τ ) → C ( τ ) and ˆ C ( 0 ) → C ( 0 ) . Estimation error is small when ˆ C is close to C . But expect estimation error to be big when λ τ k − λ τ k + 1 is small.

1 d Ornstein-Uhlenbeck process with MSM (indicator) basis. Approximation error in span { η 1 , η 2 , η 3 } . Trial 1: n = 20, trajectory length = 10000. Trial 2: n = 50, trajectory length = 500 As τ increases approximation error decreases, but estimation error increases.

VAC summary ◮ New convergence bounds for VAC eigenfunctions. ◮ New understanding of the role of lag time. ◮ New diagnostic tools to help choose lag time.

Dynamic Galerkin Approximation (DGA) Erik Thiede, John Strahan, Dimitrios Giannakis, Aaron Dinner The truely longest timescale pocesses are often physically irrelevant If we have a specific event in mind we should compute quantities specific to that event. E.g. to predict the event that we reach X t ∈ B before X t ∈ A starting from X 0 = x we should compute q + ( x ) = P ( T B < T A | X 0 = x ) the “committor function.” ( T A is the first time X t ∈ A )

DGA computes conditional expectations We’ll want to incorporate a domain D : T τ f ( x ) = E [ f ( X τ ∧ T ) | X 0 = x ] where T is the first time X t exits D . DGA estimates functions of the form: � T � � � � u ( x ) = E g ( X T ) + h ( X s ) ds � X 0 = x � 0 � 1 , x ∈ ∂ B E.g. for u = q + plug in D = A ∪ B , h = 0, and g ( x ) = 0 , x ∈ ∂ A Elaborations of the basic setup including e.g. a potential term and time dependence are straightforward (in principle)

DGA relies on the Feynman-Kac relation � τ T s [ h 1 D ] ds on D u − T τ u = and u = g on ∂ D 0 to avoid generating long trajectories. If u = ψ + w for a “guess” ψ satisfying the BCs and � τ r τ = T τ ψ − ψ + T s [ h 1 D ] ds 0 then we can solve w − T τ w = r τ on D and w = 0 on ∂ D for w .

Jonathan Weare Courant Institute, New York University with Aaron - PowerPoint PPT Presentation

Characterizing long timescale phenomena with trajectory data Jonathan Weare Courant Institute, New York University with Aaron Dinner, Douglas Dow, Dimitrios Giannakis, John Strahan, Erik Thiede, and Rob Webber April 23, 2020 The most

Long time scales and unlikely events: sampling and coarse graining strategies Jonathan Weare

1. 2. 3. 4. ~ v 5. ~ v 6. ~ v 7. ~ v 8. ~ v 9. 10. ave us 11. _;-~~ L~' ~ u ~- . ,

2020-2021 Weare School District Deliberative Session February 5, 2020 Budget Foundations The

Meeting Tina Lipuek | Ljubljana | 7 July 2016 WHERE WEARE Joint Action in Month 29

Student Financial Services Who weare in Student Financial Services , we are here to help

Security on Rails Jonathan Weiss, 03.09.2008 Peritor GmbH Who are we? Jonathan W Jonathan

Project 3 Q&A Jonathan Krause Fei-Fei Li, Jonathan Krause Lecture 7 - 1 13-Apr-15 Outline

Lecture 2: Introduction to Segmentation Jonathan Krause Fei-Fei Li, Jonathan Krause Lecture 2 -

Lecture 6: Introduction to Detection Jonathan Krause Fei-Fei Li, Jonathan Krause Lecture 6 - 1

Spatial Competition with Heterogeneous Firms Jonathan Vogel November 2007 Jonathan Vogel ()

Fast Randomized Iteration: Diffusion Monte Carlo through the Lens of Numerical Linear Algebra

Large Deviation Theory for the Analysis of Power Tansmission Systems Subject to Stochastic

Dispute Resolution in Islamic Finance By Jonathan Lawrence (jonathan.lawrence@klgates.com), Peter

Deploying and Monitoring Ruby on Rails A practical guide Mathias Meyer and Jonathan Weiss,

Jonathan Pollet conference Part 2: Stuxnet APT-Style Attacks on SCADA Systems Stuxnet Night

Jonathan P. Dowling Jonathan P. Dowling Quantum Science & Technologies Group Quantum Science

The Geometry of Imprecise Inference Mik elis Bickis University of Saskatchewan ISIPTA15

ET-805 Bayesian Knowledge Tracing Ramkumar.Rajendran@iitb.ac.in Activity - TPS Think

Advanced Mathematical Methods Part II Statistics Statistical Inference Mel Slater

7 Modelling Uncertainty Bayes theorem 7 Modelling Uncertainty Bayes theorem

HEALTHCARE SERVICE DELIVERY Rebecca Womack, MS, BCBA, LBA Executive Director Behavior Analysis

Why river basin Prof. R. Nagarajan, CSRE , IIT Bombay GNR 624 : Water Resources and River basin

Tarrant County Family Court Services Orientation Who We Are & What We Do Family Court

Needs During the 2020- 2021 School Year Bringing Into Focus Summer Webinar Series September 15,

Jonathan Weare Courant Institute, New York University with Aaron - PowerPoint PPT Presentation

Characterizing long timescale phenomena with trajectory data Jonathan Weare Courant Institute, New York University with Aaron Dinner, Douglas Dow, Dimitrios Giannakis, John Strahan, Erik Thiede, and Rob Webber April 23, 2020 The most

Long time scales and unlikely events: sampling and coarse graining strategies Jonathan Weare

1. 2. 3. 4. ~ v 5. ~ v 6. ~ v 7. ~ v 8. ~ v 9. 10. ave us 11. _;-~~ L~' ~ u ~- . ,

2020-2021 Weare School District Deliberative Session February 5, 2020 Budget Foundations The

Meeting Tina Lipuek | Ljubljana | 7 July 2016 WHERE WEARE Joint Action in Month 29

Student Financial Services Who weare in Student Financial Services , we are here to help

Security on Rails Jonathan Weiss, 03.09.2008 Peritor GmbH Who are we? Jonathan W Jonathan

Project 3 Q&amp;A Jonathan Krause Fei-Fei Li, Jonathan Krause Lecture 7 - 1 13-Apr-15 Outline

Lecture 2: Introduction to Segmentation Jonathan Krause Fei-Fei Li, Jonathan Krause Lecture 2 -

Lecture 6: Introduction to Detection Jonathan Krause Fei-Fei Li, Jonathan Krause Lecture 6 - 1

Spatial Competition with Heterogeneous Firms Jonathan Vogel November 2007 Jonathan Vogel ()

Fast Randomized Iteration: Diffusion Monte Carlo through the Lens of Numerical Linear Algebra

Large Deviation Theory for the Analysis of Power Tansmission Systems Subject to Stochastic

Dispute Resolution in Islamic Finance By Jonathan Lawrence (jonathan.lawrence@klgates.com), Peter

Deploying and Monitoring Ruby on Rails A practical guide Mathias Meyer and Jonathan Weiss,

Jonathan Pollet conference Part 2: Stuxnet APT-Style Attacks on SCADA Systems Stuxnet Night

Jonathan P. Dowling Jonathan P. Dowling Quantum Science &amp; Technologies Group Quantum Science

The Geometry of Imprecise Inference Mik elis Bickis University of Saskatchewan ISIPTA15

ET-805 Bayesian Knowledge Tracing Ramkumar.Rajendran@iitb.ac.in Activity - TPS Think

Advanced Mathematical Methods Part II Statistics Statistical Inference Mel Slater

7 Modelling Uncertainty Bayes theorem 7 Modelling Uncertainty Bayes theorem

HEALTHCARE SERVICE DELIVERY Rebecca Womack, MS, BCBA, LBA Executive Director Behavior Analysis

Why river basin Prof. R. Nagarajan, CSRE , IIT Bombay GNR 624 : Water Resources and River basin

Tarrant County Family Court Services Orientation Who We Are &amp; What We Do Family Court

Needs During the 2020- 2021 School Year Bringing Into Focus Summer Webinar Series September 15,

Project 3 Q&A Jonathan Krause Fei-Fei Li, Jonathan Krause Lecture 7 - 1 13-Apr-15 Outline

Jonathan P. Dowling Jonathan P. Dowling Quantum Science & Technologies Group Quantum Science

Tarrant County Family Court Services Orientation Who We Are & What We Do Family Court