1 data science and engineering for local weather
play

1 Data science and engineering for local weather forecasts Nikhil - PowerPoint PPT Presentation

1 Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist, Engineer} November, 2016 Agenda About MeteoGroup Introduction to weather data Problem description Data science and weather


  1. 1

  2. Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist, Engineer} November, 2016

  3. Agenda About MeteoGroup ● Introduction to weather data ● Problem description ● Data science and weather forecasting ● Engineering ● Verification ● Results ● Questions ● 3

  4. How many of you check weather forecasts frequently? 4

  5. 5

  6. Weather data 6

  7. 1.5 TB/day 7

  8. Types of data Observations: ● WMO weather stations (e.g: surface, upper-air, ships, drifting buoys, aircrafts etc) ● MeteoGroup measurement network 8

  9. Types of data Observations: ● WMO weather stations (e.g: surface, upper-air, ships, drifting buoys, aircrafts etc) ● MeteoGroup measurement network Satellite data 9

  10. Types of data Observations: ● WMO weather stations (e.g: surface, upper-air, ships, drifting buoys, aircrafts etc) ● MeteoGroup measurement network Satellite data Radar data 10

  11. Types of data Observations: WMO weather stations (e.g: surface, upper-air, ships, drifting buoys, aircrafts ● etc) MeteoGroup measurement network ● Satellite data Radar data User data 11

  12. Types of data Observations: WMO weather stations (e.g: surface, upper-air, ships, drifting buoys, aircrafts ● etc) MeteoGroup measurement network ● Satellite data Radar data User data Numerical weather prediction model data 12

  13. Numerical weather prediction models ● Complex and Multidimensional data 13

  14. Numerical weather prediction models ● Complex and multidimensional data ● 5 NWP models from different providers 14

  15. Numerical weather prediction models ● Complex and multidimensional data ● 5 NWP models from different providers ● Data size per day - 0.5 TB 15

  16. Data science and weather forecasting 16

  17. 17

  18. Outcome ● Took 24 hours for 24 hour forecasts ● Grid interval - 736 km ● Poor results 18

  19. MeteoGroup Forecasting system 19

  20. MeteoGroup forecasting system 3 years of Machine learning NWP data Trained Forecasts model model 3 years of Daily NWP observation data data 20

  21. MeteoGroup forecasting system Written in pascal 21

  22. MeteoGroup forecasting system Written in pascal Runs on in house high performance computing cluster 22

  23. MeteoGroup forecasting system Written in pascal Runs on in house high performance computing cluster Limitations ● Hard to maintain ● Not very transparent ● Scalability 23

  24. Problem description 24

  25. Next generation forecasting system ● Cloud based solution 25

  26. Next generation forecasting system ● Cloud based solution ● Transparent 26

  27. Next generation forecasting system ● Cloud based solution ● Transparent ● Scalable 27

  28. Next generation forecasting system ● Cloud based solution ● Transparent ● Scalable ● Improve forecasting accuracy 28

  29. Baseline model Downscale to Interpolate NWP data Linear model location missing values 29

  30. Baseline model Downscale to Interpolate NWP data Linear model location missing values Outcome: ● Very fast ● Poor accuracy ● Multicollinearity 30

  31. Iteration 1 ● Address multicollinearity using feature selection ● Scale the features Downscale to Interpolate Scale Feature NWP data Linear model location missing values features selection 31

  32. Iteration 1 ● Address multicollinearity using feature selection ● Scale the features Downscale to Interpolate Scale Feature NWP data Linear model location missing values features selection Outcome: ● Improved accuracy 32

  33. Iteration 2 ● Model selection between linear and non-linear models ● Advanced feature selection Model selection Advance Downscale to Interpolate Scale (linear and NWP data feature location missing values non-linear features selection models) 33

  34. Iteration 2 ● Model selection between linear and non-linear models ● Advanced feature selection Model selection Advance Downscale to Interpolate Scale (linear and NWP data feature location missing values non-linear features selection models) Outcome: ● On par with existing forecasting system ● Slow training 34

  35. Engineering to scale the product 35

  36. Baseline model engineering (Scikit-learn, NumPy, Keras with TensorFlow) 36

  37. Model engineering (Scikit-learn, NumPy, Keras with TensorFlow) Good: Python ML ecosystem ● Familiarity among the team ● Test driven and Agile Development ● Fail fast ● 37

  38. Model engineering (Scikit-learn, NumPy, Keras with TensorFlow) Good: Python ML ecosystem ● Familiarity among the team ● Test driven and Agile Development ● Fail fast ● Bad: Not scalable ● 38

  39. 47000 * 15 * 360 model runs Locations Weather attributes Hours e.g: temperature, wind etc 39

  40. Scaling with Apache Airflow Apache Airflow • By AirBnB • Apache product since early 2016 Directed Acyclic Graph (DAG) Components • UI • Scheduler • Executor(s) 40

  41. Apache Airflow DAG ● Hooks (connections) ● Operators (tasks) ● Schedule ● Dependencies 41

  42. Airflow and Mesos deploy AWS S3 persist Airflow scheduler Mesos cluster 42

  43. Airflow and Mesos Cont Integ deploy Persist AWS S3 Airflow scheduler Mesos cluster 43

  44. Verification 44

  45. Model improvement cycle Deploy DAG Verify model Improve DAG 45

  46. Forecast verification Forecast Engine AWS S3 with models JSON-LD 46

  47. Verification metrics ● Mean absolute error ● Root mean squared error ● Mean error ● Heidke skill score ● Equitable threat score ● Probability density functions ● Error percentiles 47

  48. Mean absolute error for different models (Temperature) 48

  49. Probability distribution function for multiple models (Temperature) 49

  50. Percentile graphs for each model (Temperature)

  51. For demo please stop by MG booth 51

  52. Results Cloud based solution AWS S3, EC2, ElastiCache ● Transparent Scalable Improve forecasting accuracy 52

  53. Results Cloud based solution AWS S3, EC2, ElastiCache ● Transparent Verification microservice ● Scalable Improve forecasting accuracy 53

  54. Results Cloud based solution AWS S3, EC2, ElastiCache ● Transparent Verification microservice ● Scalable Mesos cluster ● Training time a month to 5 hours (approx) ● Improve forecasting accuracy 54

  55. Results Cloud based solution AWS S3, EC2, ElastiCache ● Transparent Verification microservice ● Scalable Mesos cluster ● Training time a month to 5 hours (approx) ● Improve forecasting accuracy On par or better ● 55

  56. Improvements Hyperlocal AWS lambda integration Iterate for more accuracy 56

  57. Questions? 57

  58. We are hiring!

  59. 59

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend