Dr. Damien Fay Smart Technology Research Centre, School of Design - - PowerPoint PPT Presentation

dr damien fay
SMART_READER_LITE
LIVE PREVIEW

Dr. Damien Fay Smart Technology Research Centre, School of Design - - PowerPoint PPT Presentation

Glasdasha: predicting arrival times from crowd sourced smartphone data without localisation or route maps Nominated for UCC best invention award 2012 Nominated for UCC best invention award 2012 Finals May 14 th 2013. Finals May 14 th 2013. Dr.


slide-1
SLIDE 1
  • Dr. Damien Fay

Smart Technology Research Centre, School of Design Engineering and Computing, University of Bournemouth. Glasdasha: predicting arrival times from crowd sourced smartphone data without localisation or route maps Han Wang, Ph.D., Co-inventor NUIG, Galway, Ireland

  • Dr. Ken Brown,

ITOBO PI, UCC, Cork, Ireland Nominated for UCC best invention award 2012 Finals May 14th 2013. Nominated for UCC best invention award 2012 Finals May 14th 2013.

slide-2
SLIDE 2

Background to Building Energy systems. ITOBO:

Information and Communication Technology for Sustainable and Optimised Building Operation.

Energy usage in buildings accounts for between 20% and 40%

[1] of total energy consumption, Heating Ventilation and Air Conditioning (HVAC) systems accounting for 50% of this figure 20% of total energy consumption (in the USA [1])

[1] L. Prez-Lombard, J. Ortiz, and C. Pout. A review on buildings energy consumption information. Energy and Buildings, 40(3):394 – 398, 2008.

slide-3
SLIDE 3

Overview of the wider field.

Building systems research focuses on:

Retro-fitting

HVAC* systems Storage heating systems. Boiler systems 50 % of energy expended “moving air” Sensor networks: detecting presence, indoor localisation, temperature, black body radiation, humidity, CO2 Preference modelling. Weather forecasting and BMS* BMS not connected to the network. MPC* control Predictive control; requires forecasts of occupancy.

HVAC – Heating ventilation and Air Conditioning. MPC – Model Predictive Control. BMS – Building Management System (the computer that operates the HVAC).

slide-4
SLIDE 4

Occupancy modelling prior research.

References

[1] B. Dong and B. Andrews, ”Sensor-based occupancy behavioral pattern recognition for energy and comfort management in intelligent buildings”, IBPSA Conf. Glasgow, Scotland, July 27-30, 2009. [2] M. Hoeynck and B. W. Andrews. Sensor-based occupancy and behavior prediction method for intelligently controlling energy consumption within a building,Patent 20 100 025 483, 04 02, 2010. [Online]. Available: http://www.faqs.org/patents/app/20100025483. [3] D. Bourgeois, I. Macdonald, J. Hand, C. Reinhart. Adding Sub-Hourly Occupancy Prediction, Occupancy- Sensing Control And Manual Environmental control to ESP-r, Proceedings of the ESIM 2004 Conference, Vancouver, B.C., June 10-11, 2004, pp. 1-8 [41] R. Sallehuddin and S. M. Hj. Shamsuddin, Hybrid grey relational artificial neural network and auto regressive integrated moving average model for forecasting time- series data, Appl. Artif. Intell., vol. 23,

  • pp. 443486, May 2009.

[42] E. Manavoglu, D. Pavlov, and C. Giles, Probabilistic user behavior mod-ls, in Data Mining, 2003. ICDM 2003. Third IEEE International Conference on, nov. 2003, pp. 203 210. [43] A. Mahdavi and C. Prglhf, User behavior and energy performance in buildings, in 6th Internationalen En- ergiewirtschaftstagung an der TU Wien, IEWT 2009, February 2009. [44] D.J.C. MacKay, Information Theory, Inference and Learning Algorithm Cambridge University Press,

  • 2003. 10

[45] C. Liao, Y. Lin, P. Barooah. Agent-based and graphical modelling ofbuilding occupancy. Journal of Building Performance Simulation, 2011.

slide-5
SLIDE 5

Occupancy prediction via smartphone localisation – prior research.

References

[1] M. Gupta, S. S. Intille, and K. Larson. Adding gps-control to traditional thermostats: An exploration of potential energy savings and design challenges. In Proceedings of the 7th International Conference on Pervasive Computing, Pervasive ’09, pages 95–114,Berlin, Heidelberg, 2009. Springer-Verlag. [2] J. Scott, J. Krumm, B. Meyers, A. J. Brush, and A. Kapoor. Home heating using gps-based arrival prediction, MSR research paper, 2010. [3] J. Krumm and A. J. B. Brush. Learning time-based presence probabilities. In Pervasive computing, pages 79–96, 2011.

Biagioni, T. Gerlich, T. Merrifield, and J. Eriksson. Easytracker: automatic transit tracking, mapping, and arrival time prediction using smartphones. In SenSys, pages 68– 81, 2011.

  • A. Thiagarajan and J. Biagioni. Cooperative transit tracking using smart-phones.

Challenges, pages 85–98, 2010.

  • G. Liu and J. Maguire, G.Q. A predictive mobility management algorithm for wireless

mobile computing and communications. In Universal Personal Communications. 1995. Record., 1995 Fourth IEEE International Conference on, pages 268 –272, nov 1995.

  • K. Laasonen. Clustering and prediction of mobile user routes from cellular data. In

PKDD, pages 569–576, 2005.

slide-6
SLIDE 6

Occupancy prediction via smartphone localisation – prior research.

References

[1] M. Gupta, S. S. Intille, and K. Larson. Adding gps-control to traditional thermostats: An

exploration of potential energy savings and design challenges. In Proceedings of the 7th International Conference on Pervasive Computing, Pervasive ’09, pages 95–114,Berlin, Heidelberg,

  • 2009. Springer-Verlag.

MapQuest GPS Current GPS Home GPS Transit time Switch On heating Current location and home address given to third party. Transit time assumes direct route and mode of transport known. MapQuest estimates up to date?? May not be travelling home!.

slide-7
SLIDE 7

Occupancy prediction via smartphone localisation – prior research.

References

[2] J. Scott, J. Krumm, B. Meyers, A. J. Brush, and A. Kapoor. Home heating using gps-based arrival prediction, MSR research paper, 2010.

MapQuest GPS/WIFI/Basestation Current ∆ Home GPS Transit time Switch On heating Current location and home address given to third party. Prior on probability you are going home. Prediction time based on previous transit times (how?)

slide-8
SLIDE 8

Occupancy prediction via smartphone localisation – prior research.

References

[2] J. Scott, J. Krumm, B. Meyers, A. J. Brush, and A. Kapoor. Home heating using gps-based arrival prediction, MSR research paper, 2010.

MapQuest GPS/WIFI/Basestation Current ∆ Home GPS Transit time Switch On heating Building centric: User is not the target but rather the building. Third party not necessary Privacy central to system:

  • 1. Employer ~ has the right

to know when you arrive,

  • 2. Not your location outside
  • f work.
slide-9
SLIDE 9

Glas - Dasha

(Gaelic for Green; Chinese for building.)

GlasDasha is a smartphone application. Designed to automatically predict when the user will arrive in work. Turn on heating (or server!) in advance. Requires ~2 minutes of user setup and then no interaction from user at all. No similar product currently exists. Current research in this area is sparse and not targeted at BMS control.

slide-10
SLIDE 10

GlasDasha setup screen. When at home users selects home and their wifi access point. When in work user selects work and their wifi access point. Select 'launch application on startup' User interaction is finished forever.

slide-11
SLIDE 11

GlasDasha system overview.

Wifi router Wifi router Wifi router Wifi router

Western Gateway Building

BMS connected server Soap message via data connection Route to work

slide-12
SLIDE 12

GlasDasha system overview.

Wifi router Wifi router Wifi router Wifi router

Western Gateway Building

BMS connected server Predicted arrival time:20 mins Confidence: 75% Later arrival: 15% Not going to work at all: 10% Route to work

slide-13
SLIDE 13

Crowd sourcing for prediction.

Western Gateway Building

Predicting arrival time at a particular location is very difficult. Current approaches try to predict the sequence of points seen. GlasDasha is different: Only interested in the work building => one end location. Many users/workers expected to approach building => combine their information The building lies at the centre of a sea of access points (~1000). The aim is to discover these points in relation to the buliding.

slide-14
SLIDE 14

Crowd sourcing for prediction.

Western Gateway Building

Solution: Crowd Source ; combine all journeys from all users to the building to form an estimate of the minimum time it takes from a point to the building.

slide-15
SLIDE 15

Crowd sourcing for prediction.

Western Gateway Building

1 min 1 min 1 min 1 min 1 min 1 min 5 min 5 min 5 min 5 min 5 min 10 min 10 min 10 min 10 min 10 min 10 min 10 min 10 min 10 min

slide-16
SLIDE 16

Crowd sourcing for prediction.

Western Gateway Building

5 mins 10 mins 15 mins Build up a field with respect to the building surroundings: For a particular user his movements relative to this field form a pattern. These patterns are used to determine his arrival time& probability / if he is even on the way to work.

slide-17
SLIDE 17

Localisation: WIFI/GPS/Base station Private/Smartphone side. Public/Server logic side. Collect field information and Construct the field Comparison to Field Compare to stored patterns, J Transmit field information to phones (trx initiated by phone) Create field update Information Create false field update Information Decision code for systems Random waiting time Transmit arrival Prediction message to server Transmit field update message to server anonymously Statistical analysis Block.

slide-18
SLIDE 18

Real-world study; data set.

15+2 android phones, 10 belonging to the project, 7 to users themselves. Duration of study is diverse: 8 months, 4 months, several

  • weeks. (we also have 8 additional users each with 2 weeks
  • f data that has not been processed).

2 “users” were in fact smartphones given out opportunistically to random occupants for one day to provide diversity of samples. All users based in one building at UCC. Application switches on between 8am and 10pm. Senses WIFI routers → Hashes BBSID and records, time seen, duration, signal strength. Hashing provides weak security in the sense that there is a barrier to us recovering their location. Unique random user id generated at setup. The field consists of 867 WiFi points each with at least 50

  • bservations and a total of 297,361 observations within 1 hour
  • f the building.

slide-19
SLIDE 19

The “Toms' journey” problem.

Normal routes to work are not normal! Toms' journey (one the first journeys seen) Simulated journey (we didn't record positions - deliberately).

 Starts at home,  Drives to building  Passes building heading

north

 Stops at school, drops of

kids.

 Returns to work,  Drive past building,  Crosses bridge and parks car

(for free),

 Walks back to work (2

minute walk).

slide-20
SLIDE 20

Constructing a pattern.

Normal routes to work are not normal! Toms' journey Journey time versus field time.

 Starts at home,  Drives to building  Passes building heading

north

 Stops at school, drops of

kids.

 Returns to work,  Drive past building,  Crosses bridge and parks car

(for free),

 Walks back to work (2

minute walk).

slide-21
SLIDE 21

Constructing a pattern.

Normal routes to work are not normal! Pattern construction. J - pattern  - deviation from field. fi,j – field entry at ith waypoint in transport mode j. ti – the journey time at the ith waypoint.

J ={n ∨n=∑

i=1 n

 f i , j− f i−1, j−t i−ti−1}

slide-22
SLIDE 22

Constructing a pattern.

Normal routes to work are not normal! Pattern construction.

 A pattern is constructed as:

  • i. Calculate the cumulative

difference between the transit time of two waypoints and the expected difference.

  • ii. Pattern falling –> user is

moving faster than expected.

  • iii. Pattern rising

user is → moving slower than expected – or in the wrong direction!

  • iv. Pattern flat

User is → proceeding towards building at speed expected,

J ={n ∨n=∑

i=1 n

 f i , j− f i−1, j−t i−ti−1}

J - pattern  - deviation from field. fi,j – field entry at ith waypoint in transport mode j. ti – the journey time at the ith waypoint.

slide-23
SLIDE 23

Constructing a pattern.

Normal routes to work are not normal! Pattern construction.

 Note the travel time is

simply: The original field time + the last pattern time: Original field time is known => all (!) we need to do is identify which pattern we're currently following.

t=0 f N , j

J ={n ∨n=∑

i=1 n

 f i , j− f i−1, j−t i−ti−1}

slide-24
SLIDE 24

Pattern advantages.

The pattern representation has several advantages: Missing a WIFI waypoint = one less point in the pattern => pattern changes little Compare this to a sequence prediction in which partial sequence data is huge problem. Diversity of route, A user that takes a new route (ex: along a parallel road to normal), has pretty much the same pattern. Patterns are easy to store, Patterns can be smoothed (shown later), 2 default patterns for each user: Rising at 45o – heading away from work. Flat pattern – a direct route to work at the average speed. Defaults mean a new user has a prior prediction even though they have never made the journey before.

slide-25
SLIDE 25

Implementation issues.

The implementation involves several complex issues:

Constructing the field from recorded data: Field must be mostly sane, Detecting mode of transport: Difficult if user is travelling away from work. Smoothing patterns to remove noise. Outliers in the field and natural motion noise. Pattern storage and clustering, Pattern classification, Which pattern am I currently following: A distribution over all patterns => a distribution over arrival times.

This is what the BMS wants

slide-26
SLIDE 26

Constructing the field.

Distribution of arrival times is actually well behaved.

Model as a Gaussian Mixture with 3* modes of transport

– {µwalking, σ2

walking} {µcycling, σ2 cycling} {µdriving, σ2 driving}

We could add extra modes such as driving in congestion and taking public transport. At UCC/Cork the congestion is short lived and most academics avoid it anyway. Public transport is awful and in a small city unnecessary. i.e. we did not observe these extra modes.

slide-27
SLIDE 27

Constructing the field.

Distribution of arrival times is actually well behaved.

The GMM can get the estimate wrong especially as it is blind: we don't have an initial estimate of the mode locations.

We could add extra modes such as driving in congestion and taking public transport. At UCC/Cork the congestion is short lived and most academics avoid it anyway. Public transport is awful and in a small city unnecessary. i.e. we did not observe these extra modes.

slide-28
SLIDE 28

A direct journey and the pre-field.

Search data for a direct journey to work (there are many journeys to choose from). A direct journey has a linear field-journey time scatter.

Each (3) point(s) is(are) an estimates from a single GMM fit.

The RLS is non-standard – I use a linear weight function on the upside and a sum squared weighting on the downside – outliers are expected to be mostly positive.

slide-29
SLIDE 29

A direct journey and the pre-field.

Search data for a direct journey to work (there are many journeys to choose from). A direct journey has a linear field-journey time scatter.

Each (3) point(s) is(are) an estimates from a single GMM fit.

The RLS is non-standard – I use a linear weight function on the upside and a sum squared weighting on the downside – outliers are expected to be mostly positive.

slide-30
SLIDE 30

A direct journey and the pre-field.

Identify outliers and deviations from linear fit. Re-estimate GMM estimates for those waypoints – this time we use linear fit estimates as the initial mode estimates – GMM is no longer blind. Straightens much of the field.

The RLS is non-standard – linear weight function on the upside and a sum squared weighting

  • n the downside – outliers are expected to be mostly positive.
slide-31
SLIDE 31

Field estimation.

The method outlined above is ~ sufficient but not optimal or robust. Estimating the field requires: Field points to find linear journeys, Linear journeys to find field points, Field data should form a planar graph (within a mode). => an Expectation Maximisation approach. Future work and research required here.

slide-32
SLIDE 32

Mode of transport estimation.

This is a near-textbook problem for a Hidden Markov Model. The mode of transport can be detected from the time between waypoints. A user that takes 2 minutes to pass between 2 waypoints; when the field difference for walking is 2 minutes; is probably walking. A user does not change mode of transport rapidly (mostly 0,1 or 2 changes in a journey). - i.e. there is memory in the process Outliers in the field introduce noise into mode estimation. => a HMM is the ideal mode estimation algorithm (textbook).

slide-33
SLIDE 33

Mode of transport estimation.

This is a near-textbook problem for a Hidden Markov Model. States are the modes. Emissions are Gaussian with field means and variances.

slide-34
SLIDE 34

Mode of transport estimation.

This is a near-textbook problem for a Hidden Markov Model. Outstanding problem – what do we do when the user is travelling parallel to the field – field differences will be zero while transit time is positive .... future research.

slide-35
SLIDE 35

Smoothing patterns to remove noise.

5 days of patterns (working days)

slide-36
SLIDE 36

Smoothing patterns to remove noise.

Straightforward application of a loess smoother. Note users can flag up insane field points to the central server for re-examination. Measurement noise Insane field points

slide-37
SLIDE 37

User patterns.

User number 4 has 240 patterns (from 240 journeys to work).

slide-38
SLIDE 38

Identifying clusters of user patterns.

Patterns are expanded and interpolated to all have length

  • f 1.

Clustering based on the shape of the expanded patterns with the length of the pattern added to the feature vector. 4 pattern clusters identified:

  • 1. Not going to work.
  • 2. Direct & walking 3.

Direct & cycling,

  • 3. Direct & driving

Validated with user!.

slide-39
SLIDE 39

Identifying clusters of user patterns.

Patterns are expanded and interpolated to all have length

  • f 1.

Clustering based on the shape of the expanded patterns with the length of the pattern added to the feature vector.

Distributions show the

excess time distribution

  • f the patterns

A distribution over

these distributions is the message sent to the

  • server. (not the patterns

these are private).

slide-40
SLIDE 40

Identifying clusters of user patterns.

Patterns are expanded and interpolated to all have length

  • f 1.

Clustering based on the shape of the expanded patterns with the length of the pattern added to the feature vector.

Distributions show the

excess time distribution

  • f the patterns

A distribution over

these distributions is the message sent to the

  • server. (not the patterns

these are private).

slide-41
SLIDE 41

Identifying clusters of user patterns.

Patterns are expanded and interpolated to all have length

  • f 1.

Clustering based on the shape of the expanded patterns with the length of the pattern added to the feature vector.

Distributions show the

excess time distribution

  • f the patterns

A distribution over

these distributions is the message sent to the

  • server. (not the patterns

these are private).

slide-42
SLIDE 42

Summary

Privacy prime to system. Does not require the (free?) information from a third party, Fully automated, Patent lodged, Operations on the smartphone side are simple and not power consuming, Operations on the server side are relatively simple and robust.

Localisation: WIFI/GPS/Base station Private/Smartphone side.

Public/Server logic side.

Collect field information and Construct the field Comparison to Field Compare to stored patterns, J Transmit field information to phones (trx initiated by phone) Create field update Information Create false field update Information Decision code for systems Random waiting time Transmit arrival Prediction message to server Transmit field update message to server anonymously Statistical analysis Block.

slide-43
SLIDE 43

Current tasks.

Validate forecast distributions, Estimate the field sanely, Commercialise the application, Interface with the HVAC system Alternative applications. Automated time to heating estimation preferable –> GP estimation of system dynamics. Working implementation demo of system required.