concept drift
play

Concept Drift: Learning on Data Streams Pdraig Cunningham Director - PowerPoint PPT Presentation

Concept Drift: Learning on Data Streams Pdraig Cunningham Director Insight @ UCD PI @ CeADAR Online Learning & Concept Drift Predictive Analytics The Old Days Static data, the concept doesnt change Velocity & Variety


  1. Concept Drift: Learning on Data Streams Pádraig Cunningham Director Insight @ UCD PI @ CeADAR

  2. Online Learning & Concept Drift Predictive Analytics The Old Days ❏ Static data, the concept doesn’t change ❏ Velocity & Variety rather than Volume Concept Drift Online Learning Learning on Data Streams ❏ Model Update ❏ Tools MOA: Massive Online Analysis (moa.cms.waikato.ac.nz) ❏ from the makers of Weka ❏ Apache Spark ❏ RapidMiner (rapidminer.com) ❏

  3. A typical predictive analytics task Heart attack patient admitted ❏ 19 variables measured during first 24 hours Blood pressure, age, + 17 other ordered and binary variables ❏ Considered important indicators of patient’s condition ❏ Goal: Consider just 3 features Learn from historic data ❏ No Age BMI BP Res. Build a model to identify high risk patients ❏ 1 60 20 140 Ok i.e. will not survive 30 days ❏ 2 60 21 145 Ok (based on evidence of initial 24-hour data) ❏ 3 85 23 130 Ok 4 81 22 160 No Assumed to be a static ‘concept’ 5 70 24 170 No This model is good for all time. 6 72 26 135 No 7 81 26 145 No 8 66 23 155 No Q 66 24 148 ?

  4. Predictive Analytics AKA: Supervised ML

  5. Volume: Big Data’s Little Secret ‘Big’ doesn’t really matter… Typically not a lot of data needed to build a good model ❏ Prostate Cancer Gleason Grade 3 Gleason Grade 4 Classification task on cancer microscopy images

  6. Velocity: Game Analytics Game Lifecycle (user numbers) Task: User segmentation: Predict return from new users Inputs : player profile Outputs : premium user, yes/no? Technology adoption (user type) Consider : Model trained on Early Adopters Used on Late Majority

  7. Google Trends Flappy Bird Other game lifecycles Angry Birds Farmville

  8. Energy Demand Prediction Demand profile has different ‘regimes’

  9. Concept Drift Training Time Later On Over time, things that model expects to be positive come up negative. Bad loans ❏ Antibiotic resistance ❏ Conversion / Churn prediction ❏

  10. Concept Drift: Spam Detection Without retraining error creeps up over time... Static Model Error % Updating Model Time Delany, Cunningham, Tsymbal, FLAIRS 2006

  11. Model Update Strategies ❏ First: when does new data (outcomes) come available? Immediately: energy consumption, financial prediction ❏ Time lag: credit scoring, online games - financial return ❏ Sometimes never: ❏ t t-1 Inputs ❏ Immediately Training window size Outputs ? ❏ Key parameters window size ❏ update frequency ❏

  12. Model Update ❏ Retrain model on recent ‘window’ of data ❏ Window size Big enough: hundreds or examples ❏ Oldest data in window must be relevant ❏ ❏ Update frequency model should be up to date ❏ don’t need to retrain on every click ❏ Too big Window About right: enough data? Window

  13. Software From the makers of Weka...

  14. Software From Apache...

  15. Summary The Old Days: Train static models on historic data ❏ Predictive analytics on data streams Underlying ‘concept’ changes over time ❏ How will we know? ❏ Model can be updated using new data ❏ Adaptive models Data workflows becomes a big issue ❏ New parameters to be considered ❏ Window size ❏ Update frequency ❏

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend