Big Data Algorithms with Medical Applications Yixin Chen Outline - PowerPoint PPT Presentation

Big Data Algorithms with Medical Applications Yixin Chen

Outline Challenges to big data algorithms Clinical Big Data Our new algorithms

Small data vs. Big data

Small data vs. Big data VS 一般性规律特殊性规律

Small data vs. Big data Causality Association Domain Data knowledge knowledge

Small data vs. Big data Models Model Big Data Quality Small Data Data Size

Modeling techniques Parametric VS Non-parametric Efficiency Accuracy interpretability

Efficiency of big data models High efficiency - Parallelization (constant speedup) - Algorithmic improvements (e.g. O(N 3 ) vs O(N 2 )) Large-scale Manifold Learning Maximum Variance Correction (Chen et al. ICML’13)

The need for clinical prediction • The ICU direct costs per day for survivors is between six and seven times those for non-ICU care. • Unlike patients at ICUs, general hospital wards (GHW) patients are not under extensive electronic monitoring and nurse care. • Clinical study has found that 4–17% of patients will undergo cardiopulmonary or respiratory arrest while in the GHW of hospital.

Goal: Let Data Speak! Sudden deteriorations (e.g. septic shock, cardiopulmonary or respiratory arrest) of GHW patients can often be severe and life threatening. Goal: Provide early detection and intervention based on data mining to prevent these serious, often life-threatening events. Using both clinical data and wireless body sensor data A NSF/NIH funded clinical trial at Washington University/Barnes Jewish Hospital

Clinical Data: high-dimensional real-time time-series data 34 vital signs: pulse, temperature, oxygen saturation, shock index, respirations, blood pressure, … Time/second Time/second

Previous Work Medical data mining machine medical learning knowledge methods Acute Physiology Score, Chronic decision neural Modified Early Health Score , and SVM Warning SCAP and PSI APACHE score are trees networks Score (MEWS) used to predict renal failures Main problems : Most previous general work uses a snapshot method that takes all the features at a given time as input to a model, discarding the temporal evolving of data

Machine learning task Challenges: • Classification of high- dimensional time series data • Irregular data gaps • measurement errors 30000 • class imbalance 25000 20000 Non-ICU 15000 ICU 10000 5000 0

Solution based on existing techniques Temporal feature extraction Bootstrap aggregating (bagging) Exploratory under-sampling Feature selection Exponential moving average smoothing Basic classifier ( Mao et al. KDD’12 )

Desired Classifier Properties • Nonlinear classification ability • Interpretability • Support for mixed data types • Efficiency • Multi-class classification Linear SVM and Logistic Regression Interpretable and efficient but linear SVM with RBF kernels Nonlinear but not interpretable; inefficient

Desired Classifier Properties Linear Kernel kNN NB LR NN SVM SVM Nonlinear classification Y N Y N N Y ability Interpretability N Y N Y Y N Direct support for mixed Y Y N N N N data types Efficiency Y Y Y Y Y N Multi-class classification Y Y Y Y N N

Random kitchen sinks (RKS) Random Parametric, nonlinear linear feature classifier transformation 1. Transform each input x into: exp(- i w k x), k= 1, …, K, w k ~ Gaussian distribution p(w) 2. Learn a linear model ∑ α k exp(- i w k x) Theory: based on Fourier transformation, RKS converges to RBF-SVM with large K Efficiency, but no interpretability

Key Idea: Hybrid Model Non-parametric, Parametric, Nonlinear Linear Feature Classifier Transformation Nonlinearity Efficiency Interpretability

Desired Classifier Properties Linear Kernel kNN NB LR NN DLR SVM SVM Nonlinear classification Y N Y N N Y Y ability Interpretability N Y N Y Y N Y Direct support for mixed Y Y N N N N Y data types Efficiency Y Y Y Y Y N Y Multi-class classification Y Y Y Y N N Y DLR: Density-based Logistic Regression ( Chen et al., KDD’13 )

Logistic Regression Each instance has D features: Assume : where τ (x) Training dataset: Optimization: maximize the overall log likelihood

Problem with linear models , what should be ϕ d (x)? If we set

Insights on τ (x) (Logistic regression) On the other hand: LR: Hence:

Factorization in DLR Assumption:

DLR Feature Transformation , where is an increasing function of

Conditional Probability Estimation Categorical x d : Numerical : Kernel density estimation (smoothed histogram)

Kernel density estimation Training dataset: where kernel bandwidth

DLR Learning Objective: Maximize the overall log likelihood A function of

Overview of DLR Initialize h and w Calculate new feature vector Update w Update h No Converged?

Optimization Repeat until convergence (using a LR solver) Fix and optimize Fix and optimize (steepest gradient descent) Initial h iter 1 Iter 2 Iter 3

Interpretability DLR: For example, represents a particular disease If represents the blood pressure (BP) of a patient On disease level Ranking can identify the risk factors of this disease On patient level indicates the abnormality of his BP indicates the extent of BP resulting in his disease

Kernel Ideal kernel: RBF kernel: doesn’t consider the label information

DLR Kernel DLR kernel: indicates same label indicates different label

DLR on example data Test Data: Original LR Density-based LR

Accuracy on UCI Datasets Better numerical categorical

Training Time Better numerical categorical

Results on clinical data Accuracy: LR: 0.9141 SVM: 0.9194 DLR: 0.9204 Early alert when the patient appears normal to the best doctors in the world

DLR for real large data estimation: kernel density smoothing Still too slow for big data Testing time grows as get larger estimation: histogram No curse of dimensionality for estimation Ultra-fast training and testing

DLR with Bins

DLR with Bins Not smooth Not enough data

Histogram KDE Smoothing where is the number of label in bin i is the number of instances in bin i

Different Number of Bins 5 bins 20 bins 100 bins

Results on accuracy kddcu Splice Mush w5a w8a Adult p 1K 8K 10K 50K 30K 1.26M 75 100 98.15 98.57 60.03 99.99 linearSVM 77 99.87 97.67 98.24 84.80 99.99 LR 80 99.23 97.14 97.20 75.29 N/A RBF SVM 88 99.95 98.26 98.55 85.54 99.99 DLR-b

Results on efficiency kddcu Splice Mush w5a w8a Adult p 1K 8K 10K 50K 30K 1.26M linearSVM 0.12 0.56 1.16 15 2847 81.70 0.15 0.21 0.18 0.7 2.89 55.66 LR RBF SVM 0.09 1.63 1.60 29 217 N/A 0.22 0.32 2.65 7.6 0.6 17.93 DLR-b

Feature Selection Ability DLR: • l 1 -regularization: loss(w) + c∑max (w d ,0) non-smooth optimization • However, in DLR, we can simply use c ∑w d along with constraints w d ≥ 0 smooth optimization

Top features selected by DLR standard deviation of heart rate ApEn of heart rate Energy of oxygen saturation LF of oxygen saturation LF of heart rate DFA of oxygen saturation Mean of heart rate HF of heart rate Inertia of heart rate Homogeneity of heart rate Energy of heart rate linear correlation of heart rate of oxygen saturation

Conclusions on DLR DLR satisfies all the following: • Nonlinear classification ability • Support for mixed data types • Interpretability • Efficiency • Multi-class classification Try it out! http://www.cse.wustl.edu/~wenlinchen/project/DLR/

Big Data Algorithms • Hybrid! - Non-parametric + parametric - Association + causality - Generative + discriminative - Balance accuracy and speed • For real big data, get rid of heavy machinery - Let accuracy grow with data size • Linear model would suffice with enough nonlinearity/randomness

Thank you

大数据时代的挑战：人才麦肯锡全球研究院报告：大数据人才稀缺第

RKS: Linear model over nonlinear features Random Linear Kernel kNN NB LR NN Kitchen SVM SVM Sinks Nonlinear classification Y N Y N N Y Y ability Interpretability N Y N Y Y N N Direct support for mixed Y Y N N N N N data types Efficiency Y Y Y Y Y N Y Multi-class classification Y Y Y Y N N N RBF SVM: k(x,x ’ ) =

Gaussian Naive Bayes Assumption: Gaussian:

LR and GNB Assumption: Both GNB and LR express in a linear model GNB learns under GNB assumption LR learns using maximum likelihood of the data

Motivation Assumption: NB LR

Motivation GNB Assumption: Factorizing by Naïve Bayes Factorizing by

Big Data Algorithms with Medical Applications Yixin Chen Outline - PowerPoint PPT Presentation

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data algorithms Clinical Big Data Our new algorithms Small data vs. Big data Small data vs. Big data VS Small data vs. Big

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong University Nov. 22, 2019 Algorithms

Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong University Nov. 22, 2019 Algorithms

Big- Big -O O Analyzing Algorithms Asymptotically Analyzing Algorithms Asymptotically P1 P2

ANALYSIS OF ALGORITHMS AND BIG-O CS16: Introduction to Algorithms & Data Structures Tuesday,

Analysis of Algorithms & Big-O CS16: Introduction to Algorithms & Data Structures Spring

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong University Oct. 25, 2019

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data

From Big Data Management to Big Data Science 1 What is next? Real big data is widely available

Why do big data and cloud systems slow down and stop? Shan Lu What are? Why do big data and

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Financing Clean Energy in Your City: The Potential for Urban Green Banks in the U.S. January 1

Meeting Minutes CLASS ADVISORY SENATE 9 April 2018 1. Call to Order: Garry Dudley, 68,

EE-559 Deep learning 6. Going deeper Fran cois Fleuret https://fleuret.org/dlc/ [version

Designing Database Applications Walid G. Aref Roadmap for Designing Database Applications 1.

Intro to HTTP & Playing with Traceroute Chapter 2 CSC 249 January 30, 2018 1 Course

Outline Introduction - Why HIEPA - What Proposal - How 1 The Standard Model and

CIS 7414x Expert Systems Lecture 1: Introduction to Expert Systems Yuqing Tang Doctoral Program

Norms and and Electronic Electronic Institutions Institutions Norms for Behaviour for

Big Data Algorithms with Medical Applications Yixin Chen Outline - PowerPoint PPT Presentation

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data algorithms Clinical Big Data Our new algorithms Small data vs. Big data Small data vs. Big data VS Small data vs. Big

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong University Nov. 22, 2019 Algorithms

Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong University Nov. 22, 2019 Algorithms

Big- Big -O O Analyzing Algorithms Asymptotically Analyzing Algorithms Asymptotically P1 P2

ANALYSIS OF ALGORITHMS AND BIG-O CS16: Introduction to Algorithms &amp; Data Structures Tuesday,

Analysis of Algorithms &amp; Big-O CS16: Introduction to Algorithms &amp; Data Structures Spring

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong University Oct. 25, 2019

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES &amp; OPPORTUNITIES Paris Big Data

From Big Data Management to Big Data Science 1 What is next? Real big data is widely available

Why do big data and cloud systems slow down and stop? Shan Lu What are? Why do big data and

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Financing Clean Energy in Your City: The Potential for Urban Green Banks in the U.S. January 1

Meeting Minutes CLASS ADVISORY SENATE 9 April 2018 1. Call to Order: Garry Dudley, 68,

EE-559 Deep learning 6. Going deeper Fran cois Fleuret https://fleuret.org/dlc/ [version

Designing Database Applications Walid G. Aref Roadmap for Designing Database Applications 1.

Intro to HTTP &amp; Playing with Traceroute Chapter 2 CSC 249 January 30, 2018 1 Course

Outline Introduction - Why HIEPA - What Proposal - How 1 The Standard Model and

CIS 7414x Expert Systems Lecture 1: Introduction to Expert Systems Yuqing Tang Doctoral Program

Norms and and Electronic Electronic Institutions Institutions Norms for Behaviour for

ANALYSIS OF ALGORITHMS AND BIG-O CS16: Introduction to Algorithms & Data Structures Tuesday,

Analysis of Algorithms & Big-O CS16: Introduction to Algorithms & Data Structures Spring

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data

Intro to HTTP & Playing with Traceroute Chapter 2 CSC 249 January 30, 2018 1 Course