M etodos de estad stica computacional y machine learning para - PowerPoint PPT Presentation

M´ etodos de estad´ ıstica computacional y machine learning para ciencias de la vida, con una aplicaci´ on a COVID-19 Gonzalo E. Mena May 20th, 2020 Data Science Initiative and Statistics Department, Harvard University 1

Palabras iniciales (advertencias) • Charla acad´ emica, resumen sobre trabajos aplicados en ciencias de la vida (neurociencia). • Al final: algunos m´ etodos para COVID-19. M´ as preguntas que respuestas. Mostrar´ e algunos datos y an´ alisis muy preliminares sin calibrar. • Objetivo: incentivar discusi´ on y motivar trabajo en el ´ area y el uso de modelos bayesianos. • Spanglish 2

World’s current situation • Several large-scale imaging and stimulation technologies. To read and write neural activity. 3

World’s current situation • Several large-scale imaging and stimulation technologies. To read and write neural activity. Goal: to develop new experimental tools that will • Consensus on revolutionize our understanding of the brain. relevance. 3

World’s current situation • Several large-scale imaging and stimulation technologies. To read and write neural activity. Goal: to develop new experimental tools that will • Consensus on revolutionize our understanding of the brain. relevance. Major bottleneck: data analysis capabilities are much below high-throughput data collection rates (TB’s/hour). Cannot fully exploit the potential of these technologies. 3

This talk: (Neural) Data Science + COVID-19 (at the end) Claim The dialog between life sciences (neuroscience) and Statistics/Mathematics/Computation is of mutual benefit. Here, Bayesian Statistics 4

Large-scale Spike Sorting with Stimulation Artifacts

Introduction Overarching goal Stimulation and recording in large multi-electrode arrays (MEA) to read and write neural activity to achieve control. lens 0.9mm saline 1.85mm 60µm retina electrical stimulation 8-15 µm physiological recording • For control need to know the stimulus → response map fast . • Large-scale, online data analysis. 512 electrodes, 20 Khz ∼ 50 GB/hour. • Scientific and Clinical significance: development of high-resolution retinal prosthesis. 5

Tailored activation Goal: To generate artificial vision, elicit arbitrary patterns of neural activity with tailored stimuli. 6

Tailored activation Question: Is it possible to activate only the colored neurons? 6

Tailored activation Easier question: is it possible to activate only neuron A? 6

Tailored activation Stimulating with a pulse of 0 . 5 µ A on the electrode around the soma does not activate neuron A. 6

Tailored activation However, stimulating with 1 . 0 µ A does activate the neuron. 6

Tailored activation Further, stimulating with 1 . 5 µ A also activates nearby neuron B, through its axon. 6

Tailored activation Activation curves summarize responsiveness of neurons. Inferred from many increasing stimuli. 6

Stimulation artifacts Major hurdle: electrical stimuli are sensed in electrodes as artifacts , stymying identification of neural activity. 7

Stimulation artifacts Major hurdle: electrical stimuli are sensed in electrodes as artifacts , stymying identification of neural activity. • Artifacts are much larger than spikes, overlap temporally with them. 7

Stimulation artifacts Major hurdle: electrical stimuli are sensed in electrodes as artifacts , stymying identification of neural activity. • Artifacts are much larger than spikes, overlap temporally with them. Current solutions break down. Can take weeks to a human. Not online. 7

Stimulation Artifacts Problem Data contains a nuisance parameter A , Y = A + s + ǫ, Recorded traces Y , artifact A , neural activity s and noise ǫ . To infer s need to know A . 8

Stimulation Artifacts Problem Data contains a nuisance parameter A , Y = A + s + ǫ, Recorded traces Y , artifact A , neural activity s and noise ǫ . To infer s need to know A . Solution Impose structure and prior knowledge in A , s , and ǫ so ˆ A , ˆ s can be resolved. 8

Neural activity structure • Spike sorting of spontaneous activity to identify neurons. • Provide us with templates (or spikes, or action potentials waveforms) 9

The structure of stimulation artifacts • Properties are revealed by silencing neural activity. 10

The structure of stimulation artifacts • Properties are revealed by silencing neural activity. • Decays smoothly with distance from stimulating electrode and has a peak in time. Increases with strength of stimulus. Doesn’t change if stimulus is the same. 10

The structure of stimulation artifacts • Properties are revealed by silencing neural activity. • Decays smoothly with distance from stimulating electrode and has a peak in time. Increases with strength of stimulus. Doesn’t change if stimulus is the same. Non-linear and non-stationary, but smooth and structured. 10

Crafting a principled solution Consider the model Y = A + s + ǫ, • Data Y = Y t , e , j , i over time (1 ≤ t ≤ T ), space (electrode, 1 ≤ e ≤ E ), strength (1 ≤ j ≤ J ) and trial (1 ≤ i ≤ I ) dimensions. 11

Crafting a principled solution Consider the model Y = A + s + ǫ, • Data Y = Y t , e , j , i over time (1 ≤ t ≤ T ), space (electrode, 1 ≤ e ≤ E ), strength (1 ≤ j ≤ J ) and trial (1 ≤ i ≤ I ) dimensions. Imposing structure • Represent neural activity s with Toeplitz matrices (shapes) and binary vectors (timing). 11

Crafting a principled solution Consider the model Y = A + s + ǫ, • Data Y = Y t , e , j , i over time (1 ≤ t ≤ T ), space (electrode, 1 ≤ e ≤ E ), strength (1 ≤ j ≤ J ) and trial (1 ≤ i ≤ I ) dimensions. Imposing structure • Represent neural activity s with Toeplitz matrices (shapes) and binary vectors (timing). • Gaussian process (GP) to encode prior knowledge of artifact A ∼ GP (0 , K θ ), and to borrow strength. 11

Crafting a principled solution Consider the model Y = A + s + ǫ, • Data Y = Y t , e , j , i over time (1 ≤ t ≤ T ), space (electrode, 1 ≤ e ≤ E ), strength (1 ≤ j ≤ J ) and trial (1 ≤ i ≤ I ) dimensions. Imposing structure • Represent neural activity s with Toeplitz matrices (shapes) and binary vectors (timing). • Gaussian process (GP) to encode prior knowledge of artifact A ∼ GP (0 , K θ ), and to borrow strength. • Problem: n ≈ 10 6 artifact variables, O ( n 3 ) does not scale. 11

Crafting a principled solution Consider the model Y = A + s + ǫ, • Data Y = Y t , e , j , i over time (1 ≤ t ≤ T ), space (electrode, 1 ≤ e ≤ E ), strength (1 ≤ j ≤ J ) and trial (1 ≤ i ≤ I ) dimensions. Imposing structure • Represent neural activity s with Toeplitz matrices (shapes) and binary vectors (timing). • Gaussian process (GP) to encode prior knowledge of artifact A ∼ GP (0 , K θ ), and to borrow strength. • Problem: n ≈ 10 6 artifact variables, O ( n 3 ) does not scale. • Solution: Kronecker decomposition K ( θ,φ 2 ) = ρ K t ⊗ K e ⊗ K j + φ 2 I . • Each kernel must represent smoothness and non-stationarity. 11

Algorithm s from the model Y = A + s + ǫ, A ∼ GP (0 , K ˆ Goal: Obtain ˆ θ ) A , ˆ • Produce estimates increasingly in j (strength). • Rationale: at lowest strengths A is better behaved and easier to estimate. • Initial guess ˆ j +1 is the extrapolation from ˆ A 0 A [1 , j ] . • Given j , alternate between maximizing p ( s j | Y j , ˆ A j , ˆ θ ) for ˆ s j and s j , ˆ θ ) for ˆ maximizing p ( A j | Y j , ˆ A j . ˆ j , i given ˆ s n A j : s n j , i = T n b j , i are binary vectors; do greedy template • matching. 2 � � � � � ( Y j , i − ˆ � T n b j , i min A j ) − . � � b n � � j , i n � • ˆ A j given ˆ s j via filtering (posterior mean) of spike-subtracted traces. 12

Example of sorting 13

Large-scale automatic analysis Gray dots indicate human judgement. 14

Population results 1,713,233 trials. • Accuracy greater than 99.5%, also agreement in latencies. 1 • Past: weeks → Now: ≈ 15 minutes. Compatible with online control experiments. • Enhanced capabilities of technology. 1 Mena et al., PLOS computational Biology, 2017 . 15

Probabilistic neural identity inference in C.elegans

The relevance of C.elegans 16

A data processing pipeline • Raw data: 5D point processes (space x time x color) • First step: finding neurons. • Second step: identifying neurons 17

Find neurons with the help of color Brainbow (Lichtman and Sanes, 2008) stochastic coloring of neurons Tamily Weissman, 2008 Photomicrography competition 18

M etodos de estad stica computacional y machine learning para - PowerPoint PPT Presentation

M etodos de estad stica computacional y machine learning para ciencias de la vida, con una aplicaci on a COVID-19 Gonzalo E. Mena May 20th, 2020 Data Science Initiative and Statistics Department, Harvard University 1 Palabras

A unified shared-memory scheme for metaheuristics Francisco Almeida Departamento de Estad

Computational Methods for Random Epidemiological Models M etodos Computacionales para el

Modelling extreme hot events using a non homogeneous Poisson process Abaurrea, J. As n, J.

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

EACA 2014 XIV Encuentro de lgebra Computacional y Aplicaciones Barcelona June 1820 2014

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Genome-wide Regression & Prediction with the BGLR statistical package Paulino P erez

Invariant Ricci-flat K ahler metrics on tangent bundles of compact symmetric spaces Jos e

One-Way ANOVA modelling for RRAM reset curves alez 1 , Ana M. Aguilera 1 , Christian J. Acal

On solving the multi-period location-assignment problem under uncertainty a Albareda-Sambola 2

The geometric exegesis of the Dirac algorithm J. Fernando Barbero G. Instituto de Estructura de

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

M87MC Media Audiences Mafalda Stasi mafalda.stasi@coventry.ac.uk

Engaging Students on Social Media Rebecca Montgomery Digital Marketing Manager HELOA UK

RailCam By: Chris Perilla, Enrique Hernandez, Dale Mahabir, and Youssef Faltone. 1 Introduction

Mind the Gap: Abstract vs. Applied Argumentation Pietro Baroni DII - Dip. di Ingegneria

19 April 2017 Hosted by Bank of England @Gendernetworks Agenda 8.00am Registration and

10.3 An Example of Postcontractual Hidden Knowledge: The Salesman Game If the customer type is

Analysis of Multiple Time Series Kevin Sheppard

beRgoneday - elak Ca cna ! - "#!$%&'(&)#!*+##,! - elak Es bunelg 1 ! 123+'/456

M etodos de estad stica computacional y machine learning para - PowerPoint PPT Presentation

M etodos de estad stica computacional y machine learning para ciencias de la vida, con una aplicaci on a COVID-19 Gonzalo E. Mena May 20th, 2020 Data Science Initiative and Statistics Department, Harvard University 1 Palabras

A unified shared-memory scheme for metaheuristics Francisco Almeida Departamento de Estad

Computational Methods for Random Epidemiological Models M etodos Computacionales para el

Modelling extreme hot events using a non homogeneous Poisson process Abaurrea, J. As n, J.

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

EACA 2014 XIV Encuentro de lgebra Computacional y Aplicaciones Barcelona June 1820 2014

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Genome-wide Regression &amp; Prediction with the BGLR statistical package Paulino P erez

Invariant Ricci-flat K ahler metrics on tangent bundles of compact symmetric spaces Jos e

One-Way ANOVA modelling for RRAM reset curves alez 1 , Ana M. Aguilera 1 , Christian J. Acal

On solving the multi-period location-assignment problem under uncertainty a Albareda-Sambola 2

The geometric exegesis of the Dirac algorithm J. Fernando Barbero G. Instituto de Estructura de

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

M87MC Media Audiences Mafalda Stasi mafalda.stasi@coventry.ac.uk

Engaging Students on Social Media Rebecca Montgomery Digital Marketing Manager HELOA UK

RailCam By: Chris Perilla, Enrique Hernandez, Dale Mahabir, and Youssef Faltone. 1 Introduction

Mind the Gap: Abstract vs. Applied Argumentation Pietro Baroni DII - Dip. di Ingegneria

19 April 2017 Hosted by Bank of England @Gendernetworks Agenda 8.00am Registration and

10.3 An Example of Postcontractual Hidden Knowledge: The Salesman Game If the customer type is

Analysis of Multiple Time Series Kevin Sheppard

beRgoneday - elak Ca cna ! - &quot;#!$%&amp;'(&amp;)#!*+##,! - elak Es bunelg 1 ! 123+'/456

Genome-wide Regression & Prediction with the BGLR statistical package Paulino P erez

beRgoneday - elak Ca cna ! - "#!$%&'(&)#!*+##,! - elak Es bunelg 1 ! 123+'/456