urban computing
play

Urban Computing Dr. Mitra Baratchi Leiden Institute of Advanced - PowerPoint PPT Presentation

Urban Computing Dr. Mitra Baratchi Leiden Institute of Advanced Computer Science - Leiden University 21 February, 2020 Second Session: Urban Computing - Processing Time-series Data Agenda for this session Part 1: Preliminaries on


  1. Urban Computing Dr. Mitra Baratchi Leiden Institute of Advanced Computer Science - Leiden University 21 February, 2020

  2. Second Session: Urban Computing - Processing Time-series Data

  3. Agenda for this session ◮ Part 1: Preliminaries on time-series data ◮ How does time-series data look like? ◮ How do we represent time-series data to algorithms? ◮ Part 2: Techniques for processing time-series data ◮ Forecasting ◮ Classification ◮ Part 3: Assignment ◮ Put into practice some of the techniques learned today ◮ Apply on Geo-life data

  4. Part 1: Preliminaries on time-series data

  5. Why do we care about time-series data ◮ Time-series data are ubiquitous... ◮ What types of data do we have in form of time-series for Urban Computing research? ◮ Temperature ◮ Humidity ◮ Number of people, cars passing a road ◮ Price of houses ◮ Sensor measurements

  6. ◮ What can you do with this data? ◮ How do you achieve that using an available machine learning algorithm? ◮ How do we represent time-series data to available algorithms?

  7. Peculiarities of time-series Why analysis of time-series data is challenging? What qualities should algorithms for analysis of time-series data have?

  8. Dimensionality? 2 4 11 0 2.15 0.9 31.43 200.1 Temperature Leiden (Feb 2019) 1 1 5 ) 2 C 8 . ( e r u t a r e 5 p 5 . m e T 5 7 . 2 0 19-2-4 2019-2-5 2019-2-6 2019-2-7 2019-2-8 2019-2-9 2019-2-10 2019-2-11 Figure: Temperature in Leiden during the month of February so far 1 How many dimensions does the data have? Dimension is the number of attributes required to explain every instance of data Length over time defines the dimensions, → many (even infinite) How would you use this data for predicting the temperature of the following days? 1data source: https://www.meteoblue.com

  9. Peculiarities of time-series data ◮ High-dimensionality: We hope to reduce dimensionality by finding a model Temp t = f ( Temp (0 ... t − 1) )

  10. Non-stationarity ◮ Non-stationarity: Data points have means, variances and covariances that change over time Figure: A non-stationary process 2 2image source:http://berkeleyearth.org/2019-temperatures/

  11. Peculiarities of time-series ◮ High-dimensionality : One instance has a lot of attributes Temp t = f ( Temp (0 ... t − 1) ) ◮ Non-stationarity: Data points have means, variances and covariances that change over time (related to concept drift) ◮ Single versus multi-variate time-series : Multiple sensors at the same time, multiple high-dimensional data ◮ Distortions in time-series data : Missing values, noises, etc.

  12. Who has so far developed methods, algorithms for working with such data? ◮ Signal processing experts ◮ Statisticians

  13. What can we do with such data? ◮ Predict values? (Better say forecast) ◮ Classify ◮ Find patterns, clusters, outliers ◮ Query There are already algorithms designed for these tasks when dealing with non-time-series data. The problem is finding a way to represent time-series data to these algorithms.

  14. Two approaches to deal with or represent time-series data How do we represent time-series data in order to process it? ◮ Approach 1 : Take it as it is. ◮ Represent it in time domain. ◮ Main issue: (Time-series data is high dimensional → very difficult to work with) ◮ Approach 2 : Represent it in a format that is more understandable or easier to work with. Representation techniques are designed to reduce the dimensionality of data as much as possible. ◮ Frequency domain ◮ Time-frequency domain ◮ ...

  15. Approach 2-example 1 Fourier transform ◮ What is Fourier transform? ◮ What does it do? ◮ Why is it useful (in math, in engineering, etc)? ◮ How can it be useful in Urban Computing?

  16. What is Fourier transform? The basic elements: Fourier theory shows that all signals (periodic and non-periodic) can be decomposed into a linear combination of sine waves defined based on their amplitude ( A ), period ( 2 π ω ), and phase ( φ ) Figure: A sine wave, basic element of Fourier transform Asin ( ω t + φ )

  17. Fourier transform in one image Figure: View of a signal in time and frequency domain 3 3source: http://www.nti-audio.com/portals/0/pic/news/FFT-Time-Frequency-View-540.png

  18. Why is it useful? The main intuition: If the frequency domain view is sparse , we can leverage the sparsity in different ways. (e.g. create new features for classification, compress the signal, ...) Figure: Different views of a signal and levels of sparsity. 4 Question we should seek to answer before using a frequency domain transformation: Does a transformation give us a sparser, thus, more understandable representation? 4Source: https://groups.csail.mit.edu/netmit/sFFT/slidesEric.pdf

  19. Why is it useful? Intuition behind frequency ◮ Change, speed of change : If change has a repetitive pattern we see it better in the frequency domain ◮ How can we use frequency analysis in urban computing? ◮ Typically any phenomenon with a periodic pattern can be captured in the frequency domain ◮ Periodicity in trajectory data (daily, weekly, seasonal, yearly patterns) ◮ Activities with periodic patterns from accelerometer data (walking, running, biking) ◮ Forecasting ◮ Compressing data

  20. Approach 2-example 2 Wavelet transform ◮ Fourier analysis tells you what frequency components are strong in a signal, but not where in the signal (frequency view) ◮ Wavelet tells you what frequency components and also where they happen in a signal (time + frequency view) ◮ Useful for multi-resolution analysis

  21. Time, Frequency, Frequency-time domains 5 ◮ Lower frequency components take more time ◮ Higher frequency components take less time 5http://www.cerm.unifi.it/EUcourse2001/Guntherlecturenotes.pdf

  22. Example case Figure: Assen sensor setup We collected WiFi data from a city during TT festival. ◮ What would you do to see what happened in the city during the festival? ◮ How would you automate the process of detecting things that changed during the festival?

  23. Multi-resolution analysis using Wavelets Multiresolution analysis on visits of people to TT festival. When and how strongly the number of visitors changed ? 128 30 2 TrainStaion normal days 64 TrainStaion during festival 25 Stage area normal days 32 coefficient * 10 3 Stage area during festival Period(hours) 20 1.5 16 15 Value 8 1 10 4 5 2 0.5 1 0 Jun 21 Jun 22 Jun 23 Jun 24 Jun 25 Jun 26 Jun 27 Jun 28 Jun 29 Jun 30 Jun 01 0 200 400 600 800 1000 1200 1400 1600 Time Time (minutes) Figure: [PCB + 17]

  24. Example: Two approaches for dealing with the same problem How do you find important periods from one person’s trajectory data? ◮ Method 1: Time domain analysis ◮ Method 2: Frequency domain analysis

  25. Method 1: Autocorrelation function ◮ Auto -correlation function (correlation of data with itself) ◮ The value of the autocorrelation function in ( τ ) can be interpreted as the self-similarity score of a time series when shifted ( τ ) timestamps � t = T − τ ( orT ) ACF τ = 1 6 ( x t − x )( x t + τ − x ) ., τ = 0 , 1 , 2 , ..., T 7 t =1 T 6 T is used in circular autocorrelation 7 max value of τ can be smaller

  26. Circular autocorrelation function For implementing circular autocorrelation we use a shift operation from the end of time-series to its beginning ! & ! ' ! % ! & ! ' ! % !) # + (! # − ̅ ! " ()* 0 → (! " − ̅ !) # + …. ! $ ! # ! " ! $ ! # 3 = 1 ! & ! ' ! % ()* 1 → (! " − ̅ !)(! ' − ̅ !) + (! # − ̅ !)(! " − ̅ !) + …. ! & ! " ! % ! $ ! # ! " ! ' ! $ ! # Figure: Calculating autocorrelation in different lags

  27. 𝑀 1 𝑀 2 𝑀 𝑗 𝑦 𝑗 𝑧 𝑗 𝑢 𝑗 𝑦 𝑗 𝑧 𝑗 𝑢 𝑗 𝑈 𝑈 𝑈 𝑡𝑓𝑕 1 𝑈 𝑡𝑜 𝑦 𝑗 𝑧 𝑗 𝑠 𝑦 𝑗 𝑧 𝑗 𝑈 𝑈 𝑡𝑓𝑕 𝑢 𝑈 𝑈 𝑈 𝑛𝑏𝑦 𝑛𝑏𝑦 𝑀 1 𝑀 𝑗 𝑗 𝑈 𝑈 𝑇𝑂 1 𝑈 𝑇𝑂 𝑈 𝑈 𝑛𝑏𝑦 Finding periodicity using autocorrelation function 𝑈 𝑈 𝑇𝑂 𝑢 𝑡𝑜 𝑦 𝑘 𝑧 𝑘 𝑈 𝑡𝑓𝑕 𝑢 Once ACF is visualized in a graph, the peaks on the autocorrelation graph can show the periods of repetitive behavior (described in section 4.3). Measuring the self-similarity Discovery of the periods of repetition Extracting Periodic Input stream over different lags from the self-similarity graph patterns UACF graph UACF graph Periodic pattern (period=24) 1 1 0.9 SP1 0.9 0.9 0.8 SP2 Probability (Presence) (x1,y1,t1) 0.8 0.8 0.7 0.7 24 0.7 0.6 . 168 UACF 0.6 UACF 0.6 0.5 . UACF 0.5 0.5 0.4 . 0.4 0.4 0.3 0.3 (xn,yn,tn) 0.3 0.2 0.2 0.1 0.2 0.1 0 200 400 600 800 1000 1200 0 0.1 0 5 10 15 20 25 Time 0 200 400 600 800 1000 1200 Time Segment Fig. 1. Our framework for finding periodic patterns from streaming mobility data. Figure: Finding periodic patterns using autocorrelation function [BMH14] 𝑢𝑡 𝑂 𝜐 ∈ 𝑂 𝐵𝐷𝐺 𝑂 𝜐 ∑ 𝑂 𝑢𝑡 𝑗 𝑢𝑡 𝑗 𝜐 𝑗=1

  28. Method 2: Periodogram ◮ A periodogram is used to identify the dominant periods (or frequencies) of a time series. ◮ After performing Fourier transform the sum of squared coefficinets in each period is used to create the periodogram

  29. Periodogram 15 P1 10 5 P2 0 0 500 1000 1500 2000 Figure: Periodogram [LDH + 10]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend