Daizong Ding 1 Mi Zhang 1 Xudong Pan 1 Min - PowerPoint PPT Presentation

2019 2019 我们毕业啦其实是答辩的标题地方 Daizong Ding 1 Mi Zhang 1 Xudong Pan 1 Min Yang 1 Xiangnan He 2 1. School of Computer Science, Fudan University 2. School of Data Science, University of Science and Technology of China

Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Time Series Prediction Length= 𝑈 Length= 𝐿 Training Inputs: 𝑌 1:𝑈 = 𝑦 1 , ⋯ , 𝑦 𝑈 Labels: 𝑍 1:𝑈 = 𝑧 1 , ⋯ , 𝑧 𝑈 Outputs: 𝑃 1:𝑈 = 𝑝 1 , ⋯ , 𝑝 𝑈 ? 𝑈 𝑝 𝑢 − 𝑧 𝑢 2 Goal: min σ 𝑢=1 Testing Inputs: 𝑌 1:𝑈+𝐿 = 𝑦 1 , ⋯ , 𝑦 𝑈 , 𝑦 𝑈+1 , ⋯ , 𝑦 𝑈+𝐿 Outputs: 𝑃 1:𝑈+𝐿 = 𝑝 1 , ⋯ , 𝑝 𝑈 , 𝑝 𝑈+1, ⋯ , 𝑝 𝑈+𝐿

Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Recurrent Neural Network Train Test Results Training 𝑧 1 𝑧 2 𝑧 𝑈 For 𝑢 = 1, ⋯ , 𝑈 : 𝑝 1 𝑝 2 𝑝 𝑈 𝑝 𝑈+𝐿 ℎ 𝑢 = 𝐻𝑆𝑉 𝑦 1 , ⋯ , 𝑦 𝑢 … … 𝑈 ℎ 𝑢 + 𝑐 𝑝 𝑝 𝑢 = 𝑋 𝑝 𝑈 𝑝 𝑢 − 𝑧 𝑢 2 min σ 𝑢=1 FC FC FC FC ℎ 𝑈+𝐿 ℎ 1 ℎ 2 ℎ 𝑈 … … Testing For 𝑢 = 1, ⋯ , 𝑈 + 𝐿 : GRU GRU … GRU … GRU ℎ 𝑢 = 𝐻𝑆𝑉 𝑦 1 , ⋯ , 𝑦 𝑢 𝑈 ℎ 𝑢 + 𝑐 𝑝 𝑝 𝑢 = 𝑋 𝑝 𝑦 1 𝑦 2 𝑦 𝑈 𝑦 𝑈+𝐿

Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Underfitting Phenomenon

Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Overfitting Phenomenon

Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Extreme Events in Time Series Data Characteristic • Extremely small or large values • Irregular • Rare occurrences • Light-tailed distributions (Gaussian, Poisson, etc.) cannot model them well Problem • Why Deep Neural Network could suffer extreme event problem in time series prediction? • How can we improve the performance on the prediction of extreme events?

Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Estimated Distribution of Labels 𝒛 𝒖 • The optimization of deep neural network under probability perspective: Bregman 𝜐 2 ⟺ max ς 𝑢=1 𝑈 𝑝 𝑢 − 𝑧 𝑢 2 𝑈 𝑈 min σ t=1 max ς 𝑢=1 𝒪 𝑧 𝑢 𝑝 𝑢 , Ƹ 𝑄 𝑧 𝑢 𝑦 𝑢 , 𝜄 Divergence • With Bayes Theorem, Estimated distribution of 𝑄 𝑍 𝑌, 𝜄 = 𝑄 𝑌 𝑍, 𝜄 𝑄 𝑍 𝑄 𝑍 = 1 labels ෠ 𝑈 𝜐 2 ) 𝑈 σ 𝑢=1 𝒪(𝑧 𝑢 , Ƹ Likelihood 𝑄 𝑌|𝜄 Posterior • DNN will internally estimate the distribution of 𝑧 𝑢 according to the sampled data.

Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Extreme Event Problem in DNN Underfitting Phenomenon • For those normal points, e.g., 𝑧 1 , 𝑄 𝑧 1 𝑌, 𝜄 = 𝑄 𝑌 𝑧 1 , 𝜄 ෠ 𝑄 𝑧 1 ≥ 𝑄 𝑌 𝑧 1 , 𝜄 𝑄 𝑢𝑠𝑣𝑓 𝑧 1 = 𝑄 𝑢𝑠𝑣𝑓 𝑧 1 𝑌, 𝜄 𝑄 𝑌, 𝜄 𝑄 𝑌, 𝜄 • For those rarely occurred extreme events, e.g., 𝑧 2 , 𝑄 𝑧 2 𝑌, 𝜄 = 𝑄 𝑌 𝑧 2 , 𝜄 ෠ 𝑄 𝑧 2 ≤ 𝑄 𝑌 𝑧 2 , 𝜄 𝑄 𝑢𝑠𝑣𝑓 𝑧 2 = 𝑄 𝑢𝑠𝑣𝑓 𝑧 2 𝑌, 𝜄 𝑄 𝑌, 𝜄 𝑄 𝑌, 𝜄 • Therefore model commonly lacks the ability of predicting extreme events 𝑧 1 𝑧 2 𝑧 3

Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Extreme Event Problem in DNN Overfitting Phenomenon • If we add weights of extreme events during the training • For those normal points, e.g., 𝑧 1 , 𝑄 𝑧 1 𝑌, 𝜄 = 𝑄 𝑌 𝑧 1 , 𝜄 ෠ 𝑄 𝑧 1 ≤ 𝑄 𝑌 𝑧 1 , 𝜄 𝑄 𝑢𝑠𝑣𝑓 𝑧 1 = 𝑄 𝑢𝑠𝑣𝑓 𝑧 1 𝑌, 𝜄 𝑄 𝑌, 𝜄 𝑄 𝑌, 𝜄 • For those rarely occurred extreme events, e.g., 𝑧 3 , 𝑄 𝑧 3 𝑌, 𝜄 = 𝑄 𝑌 𝑧 3 , 𝜄 ෠ 𝑄 𝑧 3 ≥ 𝑄 𝑌 𝑧 3 , 𝜄 𝑄 𝑢𝑠𝑣𝑓 𝑧 3 = 𝑄 𝑢𝑠𝑣𝑓 𝑧 3 𝑌, 𝜄 𝑄 𝑌, 𝜄 𝑄 𝑌, 𝜄 𝑧 1 𝑧 2 𝑧 3 • The estimated distribution is not accurate • The performance on test data is poor

Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Problem Analysis Extreme Event Problem in DNN mainly because: • Extreme events are extremely large or small values with rare occurrence. Therefore it is hard to estimate the true distribution of them given limited samples. • Usually DNN learns time series data from light-tailed likelihood, which further increases the difficulty of estimating the distribution of extreme events. 𝑧 1 𝑧 2 𝑧 3

Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Motivation: Find the regularity inside irregular extreme events According to previous research ： • Extreme events in time-series data often show some form of temporal regularity. • Randomness of extreme events have limited degrees of freedom (DOF). The pattern of extreme events after a window could be memorized ！ S&P 500

Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Recalling Extreme Events in History We propose to use Memory Network to recall extreme events in history: • For each time step 𝑢 , we sample 𝑁 windows. • For window 𝑘 , we propose to use GRU to calculate the feature 𝑡 𝑘 of the window. • Meanwhile, we also record the occurrence of extreme events 𝑟 𝑘 = −1,0,1 by setting threshold Memory Module previously at the next time step of window 𝑘 .

Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Attention Mechanism We propose to use attention to incorporate memory module with the prediction: • At time 𝑢 , we first calculate the output from GRU: • Then we construct the memory module, and calculate the similarity between the current and the history: • The final output from our model is,

Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Extreme Value Theory If we still use Gaussian likelihood, the improved model still suffer extreme event problem: • We should use a heavy-tailed likelihood to fit the distribution of extreme events given limited samples. It is hard to predict the values of extreme events, however, the DOF of extreme events are easier to be modelled. • We could propose a heavy-tailed likelihood for predicting the occurrence of extreme events.

Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Extreme Value Loss • Through Extreme Value Theory (EVT), the approximation of 𝑧 𝑢 from EVT can be written as, Scale function • 𝑤 𝑢 = 0,1 is the indicator of whether a large value will happen or not. • If we pay our attention to predict whether there is an extremely large value at 𝑢 by outputting 𝑣 𝑢 = 0,1 , we can add the weights of extreme events on binary cross entropy loss: Binary cross entropy loss • It is easy to extend the binary classification to u t , v t = −1,0,1 .

Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Optimization 𝑤 𝑢 𝑧 𝑢 The final loss function can be written as: EVL Square = −1,0,1 Loss For the two challenges in DNN: • We predict the labels from both GRU and memory module, which memorizes the regularity inside extreme events given limited samples. • We propose to minimize a heavy-tailed classification loss (EVL) for detecting the occurrence of extreme events.

Extreme Value Loss Experiments Background Problem Analysis Proposed Model Conclusion Experimental Settings • Dataset: • Stock Dataset: 564 corporations in Nasdaq Stock Market with one sample per week • Climate Dataset: Green Gas Observing Network dataset and Atmospheric Co2 Dataset • Pseudo Periodic Synthetic Dataset • Baselines: • LSTM • GRU • Time-LSTM • Research questions: • RQ1: Is our proposed framework effective in time series prediction? • RQ2: Is our proposed loss function EVL worked in detecting extreme events? • RQ3: What is the influence of hyper-parameters in the framework?

Extreme Value Loss Experiments Background Problem Analysis Proposed Model Conclusion Time Series Prediction (RMSE)

Extreme Value Loss Experiments Background Problem Analysis Proposed Model Conclusion Time Series Prediction (Visualization)

Extreme Value Loss Experiments Background Problem Analysis Proposed Model Conclusion Extreme Events Prediction (F1 Value)

Extreme Value Loss Experiments Background Problem Analysis Proposed Model Conclusion Influence of hyper-parameters

Daizong Ding 1 Mi Zhang 1 Xudong Pan 1 Min - PowerPoint PPT Presentation

2019 2019 Daizong Ding 1 Mi Zhang 1 Xudong Pan 1 Min Yang 1 Xiangnan He 2 1. School of Computer Science, Fudan University 2. School of Data Science, University of Science and Technology of

1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3

COM PAN Y PROF ILE COM PAN Y PROF ILE COM PAN Y PROF ILE COM PAN Y PROF ILE COM PAN Y PROF

Class 4 @rwdkent Overview Current Events (10 min) Break (5 min) Explore RWD (25 min) CSS

Xudong W ang Assistant Professor Department of Materials Science and Engineering University of

CENTRE-BERCY 5 Min 10 Min 45 Min 55 Min DESTINATION PARIS BERCY ACCORHOTELS ARENA THE SEINE

procedure SERIAL MIN ( A , n ) 1. 2. begin 3. min = A [ 0 ] ; 4. for i := 1 to n 1 do 5.

YANG Data Models for TE and RSVP drafu-ietg-teas-yang-te-08 drafu-ietg-teas-yang-rsvp-07

Yang Yang MICHIGAN TECH Yang Yang , yyang7@mtu.edu RESEARCH FORUM TECHTALKS Current research

User Modeling on Demographic Attributes in Big Mobile Social Networks Yang Yang Northwestern

Swift: A Register-based JIT Compiler for Embedded JVMs Yuan Zhang, Min Yang, Bo Zhou, Zhemin

Xudong Zhang 1. Int roduct ion of simulat ed annealing (S A) algorit hm 2. S equent ial S A

Windows NT Security Cunsheng Ding HKUST, Hong Kong, CHINA C. Ding - COMP4631 - L20 1 Agenda

Computer Security Cunsheng DING, HKUST COMP4631 Dr. Cunsheng DING Computer Security

Access Control Cunsheng Ding HKUST, Hong Kong, CHINA C. Ding - COMP4631 - L17 1 Agenda of this

Atomicity Bailu Ding Oct 18, 2012 Bailu Ding Atomicity Oct 18, 2012 1 / 38 Outline 1

Uncovering Plagiarism, Authorship, and Social Software Misuse PAN 2011 [pan.webis.de] The PAN

Multivariate Extreme Value models Michel Bierlaire Transport and Mobility Laboratory School of

Multivariate Extreme Value models Michel Bierlaire michel.bierlaire@epfl.ch Transport and

Extreme values: a renormalization group approach Eric Bertin Laboratoire de Physique, ENS Lyon

Upper-bounding Program Execution Time with Extreme Value Theory Francisco J. Cazorla, Eduardo

Extreme values for diffusion in random media Ivan Corwin Columbia University From pollen to

Implicit Extremes and Implicit MaxStable Laws Stilian Stoev ( sstoev@umich.edu ) University of

Zeros of random analytic functions and extreme value theory Zakhar Kabluchko University of Ulm

Directed Polymers in Random Environment with Heavy Tails A. Auffinger O. Louidor Courant (New