Architecting to Support Machine Learning
Humberto Cervantes, UAM Iurii Milovanov, SoftServe Rick Kazman, University of Hawaii
Architecting to Support Machine Learning Humberto Cervantes, UAM - - PowerPoint PPT Presentation
Architecting to Support Machine Learning Humberto Cervantes, UAM Iurii Milovanov, SoftServe Rick Kazman, University of Hawaii PARTICULARITIES OF ML SYSTEMS In ML systems, the behaviour is not specified directly in code but is learned from
Humberto Cervantes, UAM Iurii Milovanov, SoftServe Rick Kazman, University of Hawaii
PARTICULARITIES OF ML SYSTEMS
perform predictions for particular tasks
Data Expected output
Computer Computer
Model Program Data Output Traditional Programming Machine learning
TWO MAIN WORKFLOWS
data transformation rules + model data to refine model & data rules model development Raw historical data Model selection and training Trained ML Model Transformation into features model serving Trained ML Model New raw data Transformation into features Results derived from prediction automatic retraining
Development environment Serving environment
New raw data
ML SYSTEM DEVELOPMENT
The development of ML systems frequently follows a sequential approach
Model development Model serving
ML SYSTEM DEVELOPMENT
But something closer to this is needed...
Initial Model development Model serving Model refinement (Refined) Model Serving Model refinement (Refined) Model Serving
ARCHITECTING THE SYSTEM
Supporting these aspects Introduces many architectural concerns: “Architectural concerns encompass additional aspects that need to be considered as part
ARCHITECTING THE SYSTEM
We will look into more details in the steps of the workflows to discuss the concerns and decisions that can be made to satisfy them
activity and data flow step
TRAINING DATA INGESTION DATA CLEANSING AND NORMALIZATION FEATURE ENGINEERING MODEL SELECTION AND TRAINING MODEL PERSISTENCEMODEL DEVELOPMENT
NEW DATA INGESTION DATA VALIDATION FEATURES EXTRACTION MODEL TRANSFER AND PREDICTION SERVING RESULTSMODEL SERVING
workflow
TRAINING DATA INGESTION
Responsibility
Architectural concerns
○
Ingestion: Manual, Message broker, ETL Jobs ○ Storage: Object Storage, SQL or NoSQL, HDFS
○
Data labelling toolkit: Intel’s CVAT, Amazon Sagemaker Ground Truth
DATA CLEANSING AND NORMALIZATION
Responsibility
selected data and perform data conversions (such as normalization) to create a reliable data set. Architectural concerns
○ Data warehouse to support data analysis, such as HIVE
○
Data processing framework, such as Spark
FEATURE ENGINEERING
Responsibility
incorporate additional knowledge to the training data
Architectural concerns
○ Logging mechanism, such as Stackdriver Logging ○ Data versioning mechanism, such as Data Science Version Control System (DVC)
MODEL TRAINING AND SELECTION
Responsibility
evaluate a model. Architectural concerns
○ TensorFlow, PyTorch, Spark MLlib, scikit-learn, etc.
tune and evaluate a model
○ Single vs distributed training, Hardware acceleration (GPU/TPU) ○ Resource Management (e.g. Yarn, Kubernetes)
Responsibility
pipeline) to support transfer to the serving environment Architectural concerns
○ Examples: Spark MLlib Pipelines, PMML, MLeap, ONNX
○ Examples: Database, document storage, object storage, NFS, DVC
○ Example: Tensorflow Model Optimization Toolkit
MODEL PERSISTENCE
NEW DATA INGESTION
Responsibility
Architectural concerns
data observations.
DATA VALIDATION AND FEATURE EXTRACTION
Responsibility
the transformation rules defined during model development Architectural concerns
○ Usage of a data schema defined during model development
○ Realtime data storage (e.g. Cassandra) ○ Data processing framework (e.g. Spark)
MODEL TRANSFER AND PREDICTION
Responsibility
Architectural concerns
○
Transfer: re-writing, docker, PMML… ○ Support for multiple model versions, update and rollback mechanisms, for example using TensorFlow serving
PREDICTION LOCATION
Local model: the model predicts/re-trains on the client side Remote model: the model predicts/re-trains on the server side Hybrid model predicts on client and re-trains on both (federated learning)
ML Model client machine client machine ML Model server machine
data for prediction results
client machine Global ML Model server machine
model deltas model updates
Local ML Model
SERVING RESULTS
Responsibility
to a destination Architectural Concerns
CASE STUDIES
NEW DOMAIN UNDERSTANDING
networking provider, and an energy exploration and production company – to research the oil extraction process
client need for a distributed fiber-optic sensing (IoT) program.
DOMAIN-SPECIFIC TECHNOLOGY CHALLENGES / LIMITATIONS
suggested 3rd-party sensing hardware (Silixa) and data protocol (National Instruments) to address industry-specifics challenges
processing model
streams
SOLUTION DESIGN
help the end client identify observations that do not conform to the expected behavioral patterns
CASE STUDY CASE STUDY
DISTRIBUTED IOT DISTRIBUTED IOT NETWORK ACROSS OIL NETWORK ACROSS OIL & GAS PRODUCTION & GAS PRODUCTION
ARCHITECTURAL ARCHITECTURAL DRIVERS DRIVERS
(100-200GB per day).
different historical windows in near real-time (up to 5 mins)
ARCHITECTURAL ARCHITECTURAL DECISION [MODEL DEV] DECISION [MODEL DEV]
Training Data Ingestion
proprietary data protocol Data cleansing and normalization
processing Feature engineering
and exposed via SQL Model training and selection
Model persistence
ARCHITECTURAL ARCHITECTURAL DECISION [MODEL SERVING] DECISION [MODEL SERVING]
New Data Ingestion
ingest the data from the sensors Data validation an Feature extraction
Spark Streaming Model prediction
mins Serving results
exposed via Impala
and predictions
CASE STUDY CASE STUDY
SOFTSERVE SOLUTION TEAM
DISTRIBUTED IOT DISTRIBUTED IOT NETWORK ACROSS OIL NETWORK ACROSS OIL & GAS PRODUCTION & GAS PRODUCTION
scalable distributed IoT platform leveraging state-of-the-art Big Data and Cloud technologies
monitoring and user-centric BI analytics
detection solution
OUTCOMES OUTCOMES
A SoftServe innovative solution provides automatic parking space detection based
A CCTV camera installed on a rooftop captures images and the current parking state is visualized in real-time via a web application and LCD at the parking entrance. The solution can be used for both open and authorized parking areas.
SMART PARKING SOLUTION
ARCHITECTURAL ARCHITECTURAL DRIVERS DRIVERS
ARCHITECTURAL ARCHITECTURAL DECISION [MODEL DEV] DECISION [MODEL DEV]
Training Data Ingestion
data
data augmentation Data cleansing and normalization
written in Python (split image, lens correction, color correction, contrast and brightness correction etc.) Feature engineering
Model training and selection
scheduled by Ansible Model persistence
repository (MS TFS)
dockerized microservice
ARCHITECTURAL ARCHITECTURAL DECISION [MODEL SERVING] DECISION [MODEL SERVING]
New Data Ingestion
the edge device Data validation an Feature extraction
in a Docker-based microservice Model prediction
Serving results
multiple components
SMART PARKING TECHNICAL DETAILS
CONCLUSIONS
production, design decisions must be made to support the initial development and transfer to the serving environment of a model and its continuous refinement
the model development and model serving workflows
scientists must work together with data engineers, software architects and devops engineers