Ensemble Learning with Sagemaker and Step-Functions Dr. Benjamin - PowerPoint PPT Presentation

Ensemble Learning with Sagemaker and Step-Functions Dr. Benjamin Weigel | 09.09.2019 Hamburg, Germany

Benjamin Weigel Data Engineer & Cloud Coordinator Europace AG https://www.europace.de/

There is manual efgort in obtaining a mortgage

Smart Document Classification

Text Image Smart Model Model use output as input Document trained on trained on Classification OCR-extracted text page-bitmap Sequence Model trained on sequence information (i.e. “Page 1-4 is a contract”)

Options to Build a Model Training-Pipeline

AWS Sagemaker

AWS Step-Functions - define a distributed workflow as series of steps - visual workflow - long running workflows (max 1yr) - 4000 transitions/month for free - after: 0.025 USD per 1000 transitions - can get expensive quickly

AWS Step-Functions

{ "Comment" : "An example of the Amazon States Language." , "StartAt" : "FirstState" , "States" : { "FirstState" : { "Type": "Task", Amazon States "Resource" : "arn:aws:lambda:us-east-1:123456789012:function:..." , "Next": "ChoiceState" }, ... Language } } "ChoiceState": { "Type": "Choice", define a state-machine "Choices": [ ● { "Variable": "$.foo", JSON-based ● "NumericEquals": 1, "Next": "FirstMatchState" describe: }, ● { "Variable": "$.foo", a state and the ○ "NumericEquals": 2, "Next": "SecondMatchState" transition to the next } ], "Default": "DefaultState" error-conditions etc. ○ }

Step Functions can control Sagemaker ● Transform and Training Jobs directly via these Resources: Step Functions & "arn:aws:states:::sagemaker:createTransformJob.sync" Sagemaker "arn:aws:states:::sagemaker:createTrainingJob.sync" easy peasy https://docs.aws.amazon.com/step-functions/latest/dg/connect-sagemaker.html

"Image Model Training" : { "Type" : "Task", "Resource" : "arn:aws:states:::sagemaker:createTrainingJob.sync", "Parameters" : { "TrainingJobName" : "ImageModel", "AlgorithmSpecification" : { "TrainingImage" : "520713654638.dkr.ecr.eu-central-1.amazonaws.com/sagemaker-mxnet:1.3-gpu-py3", "TrainingInputMode" : "File" }, "HyperParameters" : { sagemaker: "epochs" : "80", "batch_size" : "10", "conv_block_length" : "2", "cycle_length" : "10", createTrainingJob.sync "depth" : "5", "dropout" : "0.5", "max_lr" : "0.1", "min_lr" : "0.0001", ... "start_filter" : "4", "worker" : "4" ● configure job via Parameters }, "InputDataConfig.$" : "$.generated.image_model.InputDataConfig", "OutputDataConfig" : { section "S3OutputPath.$" : "$.generated.output_artifact_paths.image_model_prefix" }, "ResourceConfig" : { "InstanceCount" : 4, "InstanceType" : "ml.p2.xlarge", "VolumeSizeInGB" : 10 }, "RoleArn" : "arn:aws:iam::123456789012:role/sm-stepfunction-iam-role", "StoppingCondition" : { "MaxRuntimeInSeconds" : 172800 } } } https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTrainingJob.html#API_CreateTrainingJob_RequestSyntax https://docs.aws.amazon.com/step-functions/latest/dg/connect-sagemaker.html

The Good Photo by Joshua Ness on Unsplash

Start simple { "StartAt" : "Train Text Model", "States" : { "Train Text Model" : { "Type" : "Task", "Resource" : "arn:aws:states:::sagemaker:createTrainingJob.sync", "Parameters" : { ... }, "End" : true } } }

Expand from there { "StartAt" : "Fetch Preprocessed Data" , "States" : { "Fetch Preprocessed Data" : { "Type" : "Task", "Resource" : "arn:aws:states:::batch:submitJob.sync" , "Next" : "Train Text Model" , "Parameters" : { "JobName" : "FetchPreparedData" , "JobDefinition" : "arn:aws:batch: us-east-1:1234567890:job-definition/job:2 ", "JobQueue" : "arn:aws:batch:us-east-1:1234567890:job-queue/queue" , "Parameters" : { "DATA_INPUT_PATH.$" : "$.input_data" , "OUTPUT_PATH.$" : "$.ready_to_use_artifacts" } } }, "Train Text Model" : { ... } } }

Retry if possible "Train Text Model" : { "Type" : "Task", "Resource" : "arn:aws:states:::sagemaker:createTrainingJob.sync", ... "Retry" : [ { "ErrorEquals" : [ "SageMaker.AmazonSageMakerException" ], "IntervalSeconds" : 1, "MaxAttempts" : 100, "BackoffRate" : 1.1 }, ... ] }

If all else fails ... "Train Text Model" : { "Type" : "Task", ... "Catch" : [{ "ErrorEquals" : ["States.ALL" ], "Next" : "Notify Failure" }] }, "Notify Failure" : { "Type" : "Task", "Resource" : "arn:aws:states:::sns:publish" , "End" : true, "Parameters" : { "Subject" : "[ERROR] Model Training failed!" , "Message" : "Error during model training!" , "TopicArn" : "arn:aws:sns:*:123456789012:alerting_topic" , "MessageAttributes" : { ... } } }

But there is a catch ...it’s a valid state after all

Fail successfully! "Notify Failure" : { "Type" : "Task", "Resource" : "arn:aws:states:::sns:publish" , "Next" : "Fail", ... }, "Fail" : { "Type" : "Fail" }

Text Image Model Model use output as input Add a few trained on trained on more models ... OCR-extracted text page-bitmap Sequence Model trained on sequence information (i.e. “Page 1-4 is a contract”)

Use concurrency for time effjciency "Fetch Preprocessed Data" : { ... "Next" : "Base Model Training" }, "Base Model Training" : { "Type" : "Parallel" , "Next" : "Train Sequence Model" , "Branches" : [ { "StartAt" : "Train Image Model" , "States" : { "Train Image Model" : { ... "End" : true } } },{ "StartAt" : "Train Text Model" , "States" : { "Train Text Model" : { ... "End" : true }}}]},

Beware of “silent” errors notification trigger won’t fire because there is no state defined for this scenario -> unexpected failure

Everything should fail the same "Base Model Training" : { "Type" : "Parallel", "Next" : "Train Sequence Model" , "Branches" : [...], "Catch" : [ { "ErrorEquals" : [ "States.ALL" ], "Next" : "Notify Failure" } ] }

Some jobs are long running and expensive then something fails and you have to debug (rerun) …

Save time & money skip some steps... "States" : { "Skip Image Model Training?" : { "Type" : "Choice" , "Choices" : [ { "Variable" : "$.train_image_model" , "BooleanEquals" : false, "Next" : "Skip Fetch Preprocessing Artifacts" } ], "Default" : "Train Image Model" }, "Skip Fetch Preprocessing Artifacts" : { "Type" : "Pass", "End" : true }, "Train Image Model" : { ... "End" : true } }

Rinse and repeat ...and add a little sprinkle on top

Our Model Training Workflow - Lambda - Batch Job - Sagemaker - SNS - Choice - Pass (to skip steps) - Fail - Wait

Our Model Training “Setup” Data (S3) Input Workflow - Input to setup state Step Function machine execution - define where the data is - (Hyper)Parameterization - Data & Models stored on S3 (each execution gets its own copy of the data) Models & Data (S3)

The Bad Photo by Markus Spiske on Unsplash

Ensemble Learning with Sagemaker and Step-Functions Dr. Benjamin - PowerPoint PPT Presentation

Ensemble Learning with Sagemaker and Step-Functions Dr. Benjamin Weigel | 09.09.2019 Hamburg, Germany Benjamin Weigel Data Engineer & Cloud Coordinator Europace AG https://www.europace.de/ There is manual efgort in obtaining a mortgage

Demo (Step 1, Selection) Demo (Step 1, Optimization) Demo (Step 2, Selection) Demo (Step 2,

Probabilistic Forecasting with DeepAR and AWS SageMaker EuroPython 2020 - Probabilistic

Quick guide Step 1: Purchasing an RSEvents! membership Step 2: Downloading RSEvents! Step 3:

Step by step guide Step 1: Purchasing an RSBlog! membership Step 2: Downloading RSBlog! Step 3:

Step by step guide Step 1: Purchasing an RSEvents! membership Step 2: Downloading RSEvents! Step

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Step 1 Step 2 Step 3 Step 4 Step 5 Preparation of a sketch Submission of birth map of all

Step by step guide Step 1: Accessing the account Step 2: Download RSFiles! 2.1 Download the

Quick guide Step 1: Purchasing RSMail! Step 2: Download RSMail! Step 3: Installing RSMail! Step

Credential Assessment Mapping Privilege Escalation at Scale Matt Weeks @scriptjunkie1 Adversary

One Year with Sagemaker Our experience of enhancing SaaS products based on custom Deep Learning

Background CLINICAL TRIAL AGREEMENT ACTIVITY TIME TRACKING PI / Department Reviews Other

Step by step guide Step 1: Purchasing a RSMembership! membership Step 2: Download RSMembership!

Selection of Design Team Step 3 Design Step 4 June 2013 Project Management Concept

Step by step guide Step 1: Purchasing an RSMail! membership Step 2: Download RSMail! 2.1.

Step by step guide Step 1: Purchasing a RSFirewall! membership Step 2: Download RSFirewall! 2.1.

Media Multitasking Media Multitasking Among American Youth Among American Youth Prevalence,

Madison SLHC Worksop HCAL SLHC DAQ E. Hazen, S. X. Wu, J. Rohlf, Boston University with J.

eingebetteter Systeme SystemC Tutorial Joachim Falk ( falk@cs.fau.de )

HTTPS Ca an Byte Me Blackhat Briefings Blackhat Briefings s November 2010 s November, 2010 1

CMSC 313 Lecture 08 Announcements Project 2 Questions More Arithmetic Instructions

Overview of Assembly Language Chapter 9 S. Dandamudi Outline Assembly language

2020 APPLICATION TRAINING February 4, 2020: February 6, 2020: Department of Administration Sleep

Midterm Review Hung-Wei Tseng Von Neumann architecture memory 8 CPU is a dominant factor of

Sambuz

Useful Links

Newsletter

Mail Us