Urban Computing Dr. Mitra Baratchi Leiden Institute of Advanced - PowerPoint PPT Presentation

Urban Computing Dr. Mitra Baratchi Leiden Institute of Advanced Computer Science - Leiden University 21 February, 2020

Second Session: Urban Computing - Processing Time-series Data

Agenda for this session ◮ Part 1: Preliminaries on time-series data ◮ How does time-series data look like? ◮ How do we represent time-series data to algorithms? ◮ Part 2: Techniques for processing time-series data ◮ Forecasting ◮ Classification ◮ Part 3: Assignment ◮ Put into practice some of the techniques learned today ◮ Apply on Geo-life data

Part 1: Preliminaries on time-series data

Why do we care about time-series data ◮ Time-series data are ubiquitous... ◮ What types of data do we have in form of time-series for Urban Computing research? ◮ Temperature ◮ Humidity ◮ Number of people, cars passing a road ◮ Price of houses ◮ Sensor measurements

◮ What can you do with this data? ◮ How do you achieve that using an available machine learning algorithm? ◮ How do we represent time-series data to available algorithms?

Peculiarities of time-series Why analysis of time-series data is challenging? What qualities should algorithms for analysis of time-series data have?

Dimensionality? 2 4 11 0 2.15 0.9 31.43 200.1 Temperature Leiden (Feb 2019) 1 1 5 ) 2 C 8 . ( e r u t a r e 5 p 5 . m e T 5 7 . 2 0 19-2-4 2019-2-5 2019-2-6 2019-2-7 2019-2-8 2019-2-9 2019-2-10 2019-2-11 Figure: Temperature in Leiden during the month of February so far 1 How many dimensions does the data have? Dimension is the number of attributes required to explain every instance of data Length over time defines the dimensions, → many (even infinite) How would you use this data for predicting the temperature of the following days? 1data source: https://www.meteoblue.com

Peculiarities of time-series data ◮ High-dimensionality: We hope to reduce dimensionality by finding a model Temp t = f ( Temp (0 ... t − 1) )

Non-stationarity ◮ Non-stationarity: Data points have means, variances and covariances that change over time Figure: A non-stationary process 2 2image source:http://berkeleyearth.org/2019-temperatures/

Peculiarities of time-series ◮ High-dimensionality : One instance has a lot of attributes Temp t = f ( Temp (0 ... t − 1) ) ◮ Non-stationarity: Data points have means, variances and covariances that change over time (related to concept drift) ◮ Single versus multi-variate time-series : Multiple sensors at the same time, multiple high-dimensional data ◮ Distortions in time-series data : Missing values, noises, etc.

Who has so far developed methods, algorithms for working with such data? ◮ Signal processing experts ◮ Statisticians

What can we do with such data? ◮ Predict values? (Better say forecast) ◮ Classify ◮ Find patterns, clusters, outliers ◮ Query There are already algorithms designed for these tasks when dealing with non-time-series data. The problem is finding a way to represent time-series data to these algorithms.

Two approaches to deal with or represent time-series data How do we represent time-series data in order to process it? ◮ Approach 1 : Take it as it is. ◮ Represent it in time domain. ◮ Main issue: (Time-series data is high dimensional → very difficult to work with) ◮ Approach 2 : Represent it in a format that is more understandable or easier to work with. Representation techniques are designed to reduce the dimensionality of data as much as possible. ◮ Frequency domain ◮ Time-frequency domain ◮ ...

Approach 2-example 1 Fourier transform ◮ What is Fourier transform? ◮ What does it do? ◮ Why is it useful (in math, in engineering, etc)? ◮ How can it be useful in Urban Computing?

What is Fourier transform? The basic elements: Fourier theory shows that all signals (periodic and non-periodic) can be decomposed into a linear combination of sine waves defined based on their amplitude ( A ), period ( 2 π ω ), and phase ( φ ) Figure: A sine wave, basic element of Fourier transform Asin ( ω t + φ )

Fourier transform in one image Figure: View of a signal in time and frequency domain 3 3source: http://www.nti-audio.com/portals/0/pic/news/FFT-Time-Frequency-View-540.png

Why is it useful? The main intuition: If the frequency domain view is sparse , we can leverage the sparsity in different ways. (e.g. create new features for classification, compress the signal, ...) Figure: Different views of a signal and levels of sparsity. 4 Question we should seek to answer before using a frequency domain transformation: Does a transformation give us a sparser, thus, more understandable representation? 4Source: https://groups.csail.mit.edu/netmit/sFFT/slidesEric.pdf

Why is it useful? Intuition behind frequency ◮ Change, speed of change : If change has a repetitive pattern we see it better in the frequency domain ◮ How can we use frequency analysis in urban computing? ◮ Typically any phenomenon with a periodic pattern can be captured in the frequency domain ◮ Periodicity in trajectory data (daily, weekly, seasonal, yearly patterns) ◮ Activities with periodic patterns from accelerometer data (walking, running, biking) ◮ Forecasting ◮ Compressing data

Approach 2-example 2 Wavelet transform ◮ Fourier analysis tells you what frequency components are strong in a signal, but not where in the signal (frequency view) ◮ Wavelet tells you what frequency components and also where they happen in a signal (time + frequency view) ◮ Useful for multi-resolution analysis

Time, Frequency, Frequency-time domains 5 ◮ Lower frequency components take more time ◮ Higher frequency components take less time 5http://www.cerm.unifi.it/EUcourse2001/Guntherlecturenotes.pdf

Example case Figure: Assen sensor setup We collected WiFi data from a city during TT festival. ◮ What would you do to see what happened in the city during the festival? ◮ How would you automate the process of detecting things that changed during the festival?

Multi-resolution analysis using Wavelets Multiresolution analysis on visits of people to TT festival. When and how strongly the number of visitors changed ? 128 30 2 TrainStaion normal days 64 TrainStaion during festival 25 Stage area normal days 32 coefficient * 10 3 Stage area during festival Period(hours) 20 1.5 16 15 Value 8 1 10 4 5 2 0.5 1 0 Jun 21 Jun 22 Jun 23 Jun 24 Jun 25 Jun 26 Jun 27 Jun 28 Jun 29 Jun 30 Jun 01 0 200 400 600 800 1000 1200 1400 1600 Time Time (minutes) Figure: [PCB + 17]

Example: Two approaches for dealing with the same problem How do you find important periods from one person’s trajectory data? ◮ Method 1: Time domain analysis ◮ Method 2: Frequency domain analysis

Method 1: Autocorrelation function ◮ Auto -correlation function (correlation of data with itself) ◮ The value of the autocorrelation function in ( τ ) can be interpreted as the self-similarity score of a time series when shifted ( τ ) timestamps � t = T − τ ( orT ) ACF τ = 1 6 ( x t − x )( x t + τ − x ) ., τ = 0 , 1 , 2 , ..., T 7 t =1 T 6 T is used in circular autocorrelation 7 max value of τ can be smaller

Circular autocorrelation function For implementing circular autocorrelation we use a shift operation from the end of time-series to its beginning ! & ! ' ! % ! & ! ' ! % !) # + (! # − ̅ ! " ()* 0 → (! " − ̅ !) # + …. ! $ ! # ! " ! $ ! # 3 = 1 ! & ! ' ! % ()* 1 → (! " − ̅ !)(! ' − ̅ !) + (! # − ̅ !)(! " − ̅ !) + …. ! & ! " ! % ! $ ! # ! " ! ' ! $ ! # Figure: Calculating autocorrelation in different lags

𝑀 1 𝑀 2 𝑀 𝑗 𝑦 𝑗 𝑧 𝑗 𝑢 𝑗 𝑦 𝑗 𝑧 𝑗 𝑢 𝑗 𝑈 𝑈 𝑈 𝑡𝑓𝑕 1 𝑈 𝑡𝑜 𝑦 𝑗 𝑧 𝑗 𝑠 𝑦 𝑗 𝑧 𝑗 𝑈 𝑈 𝑡𝑓𝑕 𝑢 𝑈 𝑈 𝑈 𝑛𝑏𝑦 𝑛𝑏𝑦 𝑀 1 𝑀 𝑗 𝑗 𝑈 𝑈 𝑇𝑂 1 𝑈 𝑇𝑂 𝑈 𝑈 𝑛𝑏𝑦 Finding periodicity using autocorrelation function 𝑈 𝑈 𝑇𝑂 𝑢 𝑡𝑜 𝑦 𝑘 𝑧 𝑘 𝑈 𝑡𝑓𝑕 𝑢 Once ACF is visualized in a graph, the peaks on the autocorrelation graph can show the periods of repetitive behavior (described in section 4.3). Measuring the self-similarity Discovery of the periods of repetition Extracting Periodic Input stream over different lags from the self-similarity graph patterns UACF graph UACF graph Periodic pattern (period=24) 1 1 0.9 SP1 0.9 0.9 0.8 SP2 Probability (Presence) (x1,y1,t1) 0.8 0.8 0.7 0.7 24 0.7 0.6 . 168 UACF 0.6 UACF 0.6 0.5 . UACF 0.5 0.5 0.4 . 0.4 0.4 0.3 0.3 (xn,yn,tn) 0.3 0.2 0.2 0.1 0.2 0.1 0 200 400 600 800 1000 1200 0 0.1 0 5 10 15 20 25 Time 0 200 400 600 800 1000 1200 Time Segment Fig. 1. Our framework for finding periodic patterns from streaming mobility data. Figure: Finding periodic patterns using autocorrelation function [BMH14] 𝑢𝑡 𝑂 𝜐 ∈ 𝑂 𝐵𝐷𝐺 𝑂 𝜐 ∑ 𝑂 𝑢𝑡 𝑗 𝑢𝑡 𝑗 𝜐 𝑗=1

Method 2: Periodogram ◮ A periodogram is used to identify the dominant periods (or frequencies) of a time series. ◮ After performing Fourier transform the sum of squared coefficinets in each period is used to create the periodogram

Periodogram 15 P1 10 5 P2 0 0 500 1000 1500 2000 Figure: Periodogram [LDH + 10]

Urban Computing Dr. Mitra Baratchi Leiden Institute of Advanced - PowerPoint PPT Presentation

Urban Computing Dr. Mitra Baratchi Leiden Institute of Advanced Computer Science - Leiden University 21 February, 2020 Second Session: Urban Computing - Processing Time-series Data Agenda for this session Part 1: Preliminaries on

Livelihood development of urban poor Livelihood development of urban poor through urban and peri

URBAN WASTEWATER URBAN WASTEWATER URBAN WASTEWATER URBAN WASTEWATER TREATMENT TREATMENT

Urban Regeneration and Social Urban Regeneration and Social Urban Regeneration and Social Urban

Urban Urban Sustainability Urban Urban Sustainability Sustainability Sustainability I di I

Accelerating Urban Innovation Urban innovation can Increase the efficiency of urban systems

High Rise Industrial Where did it come from? IH Urban Renewal Urban Renewal (Urban

Gentilly: Building on Diversity College of Urban and Public Affairs College of Urban and Public

Urban Adamah (That means City + Earth) This is the Urban Adamah farm. When you come to Urban

SAFE Urban logistics Scandinavian Analysis of urban Freight logistics using Electric

VitalVizor: A Visual Analytics System for Studying Urban Vitality Wei ZENG Yu YE Urban Vitality

KOLKATAS URBAN GREEN SPACES Urban green spaces are public and private open spaces in urban

Trustworthy Computing * Reverse engineers agree on that! Trustworthy Computing Trustworthy

FFA Soils Presentation Summer 2015 Urban Contest - Slope Urban Contest - Landform Urban Contest

Impact of multiple disturbances on microbiological quality of urban and peri-urban lakes

Tools for urban network analysis Part II Serge SALAT Data analysis by Loeiz BOURDIC Urban

Urban Harvest Workshops Urban Harvest is an urban food initiative that aims to harvest and

PageRank Ryan Tibshirani Data Mining: 36-462/36-662 January 22 2013 Optional reading: ESL 14.10

Chapter 2: Video 2 - Supplementary Slides White Noise White noise is the simplest example of a

SOCI 210: Sociological Perspectives Oct. 27 1. The state 2. State behavior 3. Political

Alcon Earnings Alcon 3Q20 Earnings Presentation November 11, 2020 Business Use Only | 1 Legal

st t st ts r

The Asymptotical Equipartition Property of Supremus Typicality in the Weak Sense Sheng Huang and

The L evy-driven Continuous-Time Garch Model Claudia Kl uppelberg Technische Universit

Semi-parametric estimation of the Poisson intensity parameter for stationary Gibbs point