Storage and Data Challenges for Production Machine Learning
Nisha Talagala CEO, Pyxeda AI
Storage and Data Challenges for Production Nisha Talagala CEO, - - PowerPoint PPT Presentation
Storage and Data Challenges for Production Nisha Talagala CEO, Pyxeda AI Machine Learning Machine Learning Growth Data: Sources and Storage Algorithms and Compute: Open Source Cloud, Hardware Innovation Growth of AI/ML
Nisha Talagala CEO, Pyxeda AI
Data: Sources and Storage Compute: Cloud, Hardware Innovation Algorithms and Open Source
Each logo is a (separate) service offered by GCP, AWS or Azure for part of an AI workflow
https://www.oreilly.com/library/view/the-new-artificial/9781492048978/
https://emerj.com/ai-sector-overviews/valuing-the-artificial-intelligence-market-graphs-and-predictions/
Despite the advanced services available, AI usage still minimal
Recognition, Anomaly Detection, etc.
Unsupervised, Reinforcement, Transfer, etc.
Prediction) AI Machine Learning Deep Learning
Data Train Model(s) Develop Model(s) Test Model(s) Deploy Model(s) Connect to Business app App developers Data Scientists ML Engineers Operations Business Need Monitor and Optimize
Data Data Cleaning Feature Eng Model Training Model Validation Model Prediction Feature Eng Live Data Business Application Model Prediction
Training Inference
networks (DL) and range from 10s MB to GBs
damage corporate brands and generate business risk
Tay bot and Bias in Amazon HR hiring tool
human social values
from intrusion
predictions and made
Data Data Cleaning Feature Eng Model Training Model Validation Model Prediction Feature Eng Live Data Business Application Model Prediction
Training Inference
D A T A
N E W D AT A N E W D AT A N E W D AT A N E W D AT A
D A T A Access control, Lineage, Tracking of all data artifacts is critical for AI Trust
Decisions
the-gdpr/
algorithm-monitoring-task-force/
(maybe later) training
resource constraints
data center architecture
storage and management strategies
IoT Reference Model
pipeline time
Operational ML.
consistency and governance
and Access Management
tiering for analytics)
these use cases
Data Data Repositories SQL Data Data Streams NoSQL
Data from Repositories or Live Streams
Flink / Apex Spark Streaming Storm / Samza / NiFi Caffe Tensor Flow Pytorch Hadoop Spark Tensor Flow SparkML, TensorFlow
Processing Engines Algorithms and Libraries
Containerized Models (Python etc.)
Edited version of slide from Balint Fleischer’s talk: Flash Memory Summit 2016, Santa Clara, CA CL
Teaching Assistants Elderly Companions Service Robots Personal Social Robots Smart Cities Robot Drones Smart Homes Intelligent Vehicles Personal Assistants (bots) Smart Enterprise
Edge Cloud