SOFTWARE ARCHITECTURE SOFTWARE ARCHITECTURE OF AI-ENABLED SYSTEMS - - PowerPoint PPT Presentation

software architecture software architecture of ai enabled
SMART_READER_LITE
LIVE PREVIEW

SOFTWARE ARCHITECTURE SOFTWARE ARCHITECTURE OF AI-ENABLED SYSTEMS - - PowerPoint PPT Presentation

SOFTWARE ARCHITECTURE SOFTWARE ARCHITECTURE OF AI-ENABLED SYSTEMS OF AI-ENABLED SYSTEMS Christian Kaestner Required reading: Hulten, Geoff. " Building Intelligent Systems: A Guide to Machine Learning Engineering. " Apress, 2018,


slide-1
SLIDE 1

SOFTWARE ARCHITECTURE SOFTWARE ARCHITECTURE OF AI-ENABLED SYSTEMS OF AI-ENABLED SYSTEMS

Christian Kaestner

Required reading: ฀ Hulten, Geoff. " " Apress, 2018, Chapter 13 (Where Intelligence Lives). ฀ Daniel Smith. " ." TheoryLane Blog Post. 2017. Building Intelligent Systems: A Guide to Machine Learning Engineering. Exploring Development Patterns in Data Science

1

slide-2
SLIDE 2

LEARNING GOALS LEARNING GOALS

Create architectural models to reason about relevant characteristics Critique the decision of where an AI model lives (e.g., cloud vs edge vs hybrid), considering the relevant tradeoffs Deliberate how and when to update models and how to collect telemetry

2

slide-3
SLIDE 3

SOFTWARE ARCHITECTURE SOFTWARE ARCHITECTURE

Requirements Miracle / genius developers Implementation

3 . 1

slide-4
SLIDE 4

SOFTWARE ARCHITECTURE SOFTWARE ARCHITECTURE

Requirements Architecture Implementation

Focused on reasoning about tradeoffs and desired qualities

3 . 2

slide-5
SLIDE 5

SOFTWARE ARCHITECTURE SOFTWARE ARCHITECTURE

The soware architecture of a program or computing system is the structure or structures of the system, which comprise soware elements, the externally visible properties of those elements, and the relationships among

  • them. -- Kazman et al. 2012

3 . 3

slide-6
SLIDE 6

WHY ARCHITECTURE? ( WHY ARCHITECTURE? ( )

Represents earliest design decisions. Aids in communication with stakeholders Shows them “how” at a level they can understand, raising questions about whether it meets their needs Defines constraints on implementation Design decisions form “load-bearing walls” of application Dictates organizational structure Teams work on different components Inhibits or enables quality attributes Similar to design patterns Supports predicting cost, quality, and schedule Typically by predicting information for each component Aids in soware evolution Reason about cost, design, and effect of changes Aids in prototyping Can implement architectural skeleton early

KAZMAN ET AL. 2012 KAZMAN ET AL. 2012

3 . 4

slide-7
SLIDE 7

CASE STUDY: TWITTER CASE STUDY: TWITTER

3 . 5

slide-8
SLIDE 8

Source and additional reading: Raffi. Twitter Blog, 2013 Speaker notes New Tweets per second record, and how!

slide-9
SLIDE 9

TWITTER - CACHING ARCHITECTURE TWITTER - CACHING ARCHITECTURE

3 . 6

slide-10
SLIDE 10

Running one of the world’s largest Ruby on Rails installations 200 engineers Monolithic: managing raw database, memcache, rendering the site, and * presenting the public APIs in one codebase Increasingly difficult to understand system; organizationally challenging to manage and parallelize engineering teams Reached the limit of throughput on our storage systems (MySQL); read and write hot spots throughout our databases Throwing machines at the problem; low throughput per machine (CPU + RAM limit, network not saturated) Optimization corner: trading off code readability vs performance Speaker notes

slide-11
SLIDE 11

TWITTER'S REDESIGN GOALS TWITTER'S REDESIGN GOALS

Performance Improve median latency; lower outliers Reduce number of machines 10x Reliability Isolate failures Maintainability "We wanted cleaner boundaries with “related” logic being in one place": encapsulation and modularity at the systems level (rather than at the class, module, or package level) Modifiability Quicker release of new features: "run small and empowered engineering teams that could make local decisions and ship user- facing changes, independent of other teams"

Raffi. Twitter Blog, 2013 New Tweets per second record, and how!

3 . 7

slide-12
SLIDE 12

TWITTER: REDESIGN TWITTER: REDESIGN DECISIONS DECISIONS

Ruby on Rails -> JVM/Scala Monolith -> Microservices RPC framework with monitoring, connection pooling, failover strategies, loadbalancing, ... built in New storage solution, temporal clustering, "roughly sortable ids" Data driven decision making

3 . 8

slide-13
SLIDE 13

TWITTER CASE STUDY: KEY INSIGHTS TWITTER CASE STUDY: KEY INSIGHTS

Architectural decisions affect entire systems, not only individual modules Abstract, different abstractions for different scenarios Reason about quality attributes early Make architectural decisions explicit Question: Did the original architect make poor decisions?

3 . 9

slide-14
SLIDE 14

ARCHITECTURAL ARCHITECTURAL MODELING AND MODELING AND REASONING REASONING

4 . 1

slide-15
SLIDE 15

4 . 2

slide-16
SLIDE 16

Map of Pittsburgh. Abstraction for navigation with cars. Speaker notes

slide-17
SLIDE 17

4 . 3

slide-18
SLIDE 18

Cycling map of Pittsburgh. Abstraction for navigation with bikes and walking. Speaker notes

slide-19
SLIDE 19
slide-20
SLIDE 20

4 . 4

slide-21
SLIDE 21

Fire zones of Pittsburgh. Various use cases, e.g., for city planners. Speaker notes

slide-22
SLIDE 22

ANALYSIS-SPECIFIC ABSTRACTIONS ANALYSIS-SPECIFIC ABSTRACTIONS

All maps were abstractions of the same real-world construct All maps were created with different goals in mind Different relevant abstractions Different reasoning opportunities Architectural models are specific system abstractions, for reasoning about specific qualities No uniform notation

4 . 5

slide-23
SLIDE 23

WHAT CAN WE REASON ABOUT? WHAT CAN WE REASON ABOUT?

4 . 6

slide-24
SLIDE 24

WHAT CAN WE REASON ABOUT? WHAT CAN WE REASON ABOUT?

Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. " " ACM SIGOPS operating systems review. Vol. 37. No. 5. ACM, 2003. The Google file system.

4 . 7

slide-25
SLIDE 25

Scalability through redundancy and replication; reliability wrt to single points of failure; performance on edges; cost Speaker notes

slide-26
SLIDE 26

MODELING RECOMMENDATIONS MODELING RECOMMENDATIONS

Use notation suitable for analysis Document meaning of boxes and edges in legend Graphical or textual both okay; whiteboard sketches oen sufficient Formal notations available

4 . 8

slide-27
SLIDE 27

CASE STUDY: AUGMENTED CASE STUDY: AUGMENTED REALITY TRANSLATION REALITY TRANSLATION

5 . 1

slide-28
SLIDE 28

Image: Speaker notes https://pixabay.com/photos/nightlife-republic-of-korea-jongno-2162772/

slide-29
SLIDE 29

CASE STUDY: AUGMENTED REALITY TRANSLATION CASE STUDY: AUGMENTED REALITY TRANSLATION

5 . 2

slide-30
SLIDE 30

CASE STUDY: AUGMENTED REALITY TRANSLATION CASE STUDY: AUGMENTED REALITY TRANSLATION

5 . 3

slide-31
SLIDE 31

Consider you want to implement an instant translation service similar toGoogle translate, but run it on embedded hardware in glasses as an augmented reality service. Speaker notes

slide-32
SLIDE 32

QUALITIES OF INTEREST? QUALITIES OF INTEREST?

5 . 4

slide-33
SLIDE 33

ARCHITECTURAL DECISION: ARCHITECTURAL DECISION: SELECTING AI TECHNIQUES SELECTING AI TECHNIQUES

What AI techniques to use and why? Tradeoffs?

slide-34
SLIDE 34

6

slide-35
SLIDE 35

Relate back to previous lecture about AI technique tradeoffs, including for example Accuracy Capabilities (e.g. classification, recommendation, clustering…) Amount of training data needed Inference latency Learning latency; incremental learning? Model size Explainable? Robust? Speaker notes

slide-36
SLIDE 36

ARCHITECTURAL DECISION: ARCHITECTURAL DECISION: WHERE SHOULD THE WHERE SHOULD THE MODEL LIVE? MODEL LIVE?

7 . 1

slide-37
SLIDE 37

WHERE SHOULD THE WHERE SHOULD THE MODEL LIVE? MODEL LIVE?

Glasses Phone Cloud What qualities are relevant for the decision?

7 . 2

slide-38
SLIDE 38

Trigger initial discussion Speaker notes

slide-39
SLIDE 39

CONSIDERATIONS CONSIDERATIONS

How much data is needed as input for the model? How much output data is produced by the model? How fast/energy consuming is model execution? What latency is needed for the application? How big is the model? How oen does it need to be updated? Cost of operating the model? (distribution + execution) Opportunities for telemetry? What happens if users are offline?

7 . 3

slide-40
SLIDE 40

EXERCISE: LATENCY AND BANDWIDTH ANALYSIS OF EXERCISE: LATENCY AND BANDWIDTH ANALYSIS OF AR TRANSLATION AR TRANSLATION

slide-41
SLIDE 41
  • 1. Identify key components of a solution and their interactions
  • 2. Estimate latency and bandwidth requirements between components
  • 3. Discuss tradeoffs among different deployment models

7 . 4

slide-42
SLIDE 42

Identify at least OCR and Translation service as two AI components in a larger system. Discuss which system components are worth modeling (e.g., rendering, database, support forum). Discuss how to get good estimates for latency and bandwidth. Some data: 200ms latency is noticable as speech pause; 20ms is perceivable as video delay, 10ms as haptic delay; 5ms referenced as cybersickness threshold for virtual reality 20ms latency might be acceptable bluetooth latency around 40ms to 200ms bluetooth bandwidth up to 3mbit, wifi 54mbit, video stream depending on quality 4 to 10mbit for low to medium quality google glasses had 5 megapixel camera, 640x360 pixel screen, 1 or 2gb ram, 16gb storage Speaker notes

slide-43
SLIDE 43

WHEN WOULD ONE USE THE FOLLOWING WHEN WOULD ONE USE THE FOLLOWING DESIGNS? DESIGNS?

Static intelligence in the product Client-side intelligence Server-centric intelligence Back-end cached intelligence Hybrid models

7 . 5

slide-44
SLIDE 44

From the reading: Static intelligence in the product difficult to update good execution latency cheap operation

  • ffline operation

no telemetry to evaluate and improve Client-side intelligence updates costly/slow, out of sync problems complexity in clients

  • ffline operation, low execution latency

Server-centric intelligence latency in model execution (remote calls) easy to update and experiment

  • peration cost

no offline operation Back-end cached intelligence precomputed common results fast execution, partial offline saves bandwidth, complicated updates Hybrid models Speaker notes

slide-45
SLIDE 45

MORE CONSIDERATIONS MORE CONSIDERATIONS

Coupling of ML pipeline parts Coupling with other parts of the system Ability for different developers and analysts to collaborate Support online experiments Ability to monitor

7 . 6

slide-46
SLIDE 46

ARCHITECTURAL DECISION: ARCHITECTURAL DECISION: TELEMETRY TELEMETRY REQUIREMENTS REQUIREMENTS

8 . 1

slide-47
SLIDE 47

TELEMETRY DESIGN TELEMETRY DESIGN

How to evaluate mistakes in production?

slide-48
SLIDE 48

8 . 2

slide-49
SLIDE 49

Discuss strategies to determine accuracy in production. What kind of telemetry needs to be collected? Speaker notes

slide-50
SLIDE 50

THE RIGHT AND RIGHT AMOUNT OF TELEMETRY THE RIGHT AND RIGHT AMOUNT OF TELEMETRY

Purpose: Monitor operation Monitor mistakes (e.g., accuracy) Improve models over time (e.g., detect new features) Challenges: too much data no/not enough data hard to measure, poor proxy measures rare events cost privacy

8 . 3

slide-51
SLIDE 51

TELEMETRY TRADEOFFS TELEMETRY TRADEOFFS

What data to collect? How much? When? Estimate data volume and possible bottlenecks in system.

slide-52
SLIDE 52

8 . 4

slide-53
SLIDE 53

Discuss alternatives and their tradeoffs. Draw models as suitable. Some data for context: Full-screen png screenshot on Pixel 2 phone (1080x1920) is about 2mb (2 megapixel); Google glasses had a 5 megapixel camera and a 640x360 pixel screen, 16gb of storage, 2gb of RAM. Cellar cost are about $10/GB. Speaker notes

slide-54
SLIDE 54

RELATED: COST OF DATA AND FEATURE RELATED: COST OF DATA AND FEATURE ENGINEERING ENGINEERING

How much data do we acquire for training and evaluating models? What data sources at what scale and latency (considering engineering cost, storage cost, processing cost, license cost, ...) Is it worth investing more time in feature engineering? What if additional data sources are needed? What is the cost for cleaning, preprocessing the data and the value of the additional accuracy?

8 . 5

slide-55
SLIDE 55

ARCHITECTURAL DECISION: ARCHITECTURAL DECISION: INDEPENDENT MODEL INDEPENDENT MODEL SERVICE SERVICE

Microservice architecture: Model Inference and Model Learning as a RESTful Service?

9 . 1

slide-56
SLIDE 56

COUPLING AND CHANGEABILITY COUPLING AND CHANGEABILITY

What's the interface between the AI component and the rest of the system? Learning data and process Inference API Where does feature extraction happen? Provide raw data (images, user profile, all past purchases) to service, grant access to shared database, or provide feature vector? Cost of feature extraction? Who bears the cost? Versioned interface? Coupling to other models? Direct coupling to data sources (e.g., files, databases)? Expected formats for raw data (e.g., image resolution)? Coupling to telemetry?

9 . 2

slide-57
SLIDE 57

MODEL SERVICE API MODEL SERVICE API

Consider encapsulating the model as a microservice. Sketch a (REST) API.

slide-58
SLIDE 58

9 . 3

slide-59
SLIDE 59

FUTURE-PROOFING AN API FUTURE-PROOFING AN API

Anticipating and encapsulating change What parts around the model service are likely to change? Rigid vs flexible data formats? Versioning of APIs Version numbers vs immutable services? Expecting to run multiple versions in parallel? Implications for learning and evolution?

9 . 4

slide-60
SLIDE 60

ROBUSTNESS ROBUSTNESS

Redundancy for availability? Load balancer for scalability? Can mistakes be isolated? Local error handling? Telemetry to isolate errors to component? Logging and log analysis for what qualities?

9 . 5

slide-61
SLIDE 61

ARCHITECTURAL DECISION: ARCHITECTURAL DECISION: UPDATING MODELS UPDATING MODELS

Design for change! Models are rarely static outside the lab Data dri, feedback loops, new features, new requirements When and how to update models? How to version? How to avoid mistakes?

10 . 1

slide-62
SLIDE 62

RISK OF STALE MODELS RISK OF STALE MODELS

What could happen if models become stale?

slide-63
SLIDE 63

Risk: Discuss dri, adversarial interactions, feedback loops

10 . 2

slide-64
SLIDE 64

UPDATE REQUIREMENTS OR GOALS UPDATE REQUIREMENTS OR GOALS

Estimate the required update frequency and the related cost regarding training, data transfer, etc.

slide-65
SLIDE 65

10 . 3

slide-66
SLIDE 66

Discuss how frequently the involved models need to be updated. Are static models acceptable? Identify what information to collect and estimate the relevant values. Speaker notes

slide-67
SLIDE 67

OUTLOOK: BIG DATA DESIGNS OUTLOOK: BIG DATA DESIGNS

Stream + Batch Processing

slide-68
SLIDE 68

10 . 4

slide-69
SLIDE 69

ARCHITECTURAL STYLES / ARCHITECTURAL STYLES / TACTICS / DESIGN TACTICS / DESIGN PATTERNS FOR AI ENABLED PATTERNS FOR AI ENABLED SYSTEMS SYSTEMS

(no standardization, yet)

11 . 1

slide-70
SLIDE 70

ARCHITECTURES AND PATTERNS ARCHITECTURES AND PATTERNS

The Big Ass Script Architecture Decoupled multi-tiered architecture (data vs data analysis vs reporting; separate business logic from ML) Microservice architecture (multiple learning and inference services) Gateway Routing Architecture Pipelines Data lake, lambda architecture Reuse between training and serving pipelines Continuous deployment, ML versioning, pipeline testing

Daniel Smith. " ." TheoryLane Blog Post. 2017. Washizaki, Hironori, Hiromu Uchida, Foutse Khomh, and Yann-Gaël Guéhéneuc. " ." Dra, 2019 Exploring Development Patterns in Data Science Machine Learning Architecture and Design Patterns

11 . 2

slide-71
SLIDE 71

ANTI-PATTERNS ANTI-PATTERNS

Big Ass Script Architecture Dead Experimental Code Paths Glue code Multiple Language Smell Pipeline Jungles Plain-Old Datatype Smell Undeclared Consumers

Washizaki, Hironori, Hiromu Uchida, Foutse Khomh, and Yann-Gaël Guéhéneuc. " ." Dra, 2019 Sculley, David, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. " ." In Advances in neural information processing systems, pp. 2503-2511. 2015. Machine Learning Architecture and Design Patterns Hidden technical debt in machine learning systems

11 . 3

slide-72
SLIDE 72

AI AS A SERVICE AI AS A SERVICE

Third-Party AI Components in the Cloud AI Components as Microservices

12 . 1

slide-73
SLIDE 73

READYMADE AI COMPONENTS IN THE CLOUD READYMADE AI COMPONENTS IN THE CLOUD

Data Infrastructure Large scale data storage, databases, stream (MongoDB, Bigtable, Kafka) Data Processing Massively parallel stream and batch processing (Sparks, Hadoop, ...) Elastic containers, virtual machines (docker, AWS lambda, ...) AI Tools Notebooks, IDEs, Visualization Learning Libraries, Frameworks (tensorflow, torch, keras, ...) Models Image, face, and speech recognition, translation Chatbots, spell checking, text analytics Recommendations, knowledge bases

12 . 2

slide-74
SLIDE 74

12 . 3

slide-75
SLIDE 75

BUILD VS BUY BUILD VS BUY

Hardware, soware, models?

12 . 4

slide-76
SLIDE 76

Discuss privacy implications Speaker notes

slide-77
SLIDE 77

REFLECTION REFLECTION

Qualities of interest? Important design tradeoffs? Decisions?

slide-78
SLIDE 78

13

slide-79
SLIDE 79

SUMMARY SUMMARY

Soware architecture is an established discipline to reason about design alternatives Understand relevant quality goals Problem-specific modeling and analysis: Gather estimates, consider design alternatives, make tradeoffs explicit Examples of important design decision: modeling technique to use where to deploy the model how and how much telemetry to collect whether and how to modularize the model service when and how to update models build vs buy, cloud resources

14

slide-80
SLIDE 80

CASE STUDY 2: UBER SURGE CASE STUDY 2: UBER SURGE PREDICTION PREDICTION

slide-81
SLIDE 81

15 . 1

slide-82
SLIDE 82

Consider you work at Uber and want to predict where rider demand is going to be high. Speaker notes

slide-83
SLIDE 83

QUALITIES OF INTEREST? QUALITIES OF INTEREST?

15 . 2

slide-84
SLIDE 84

15 . 3

slide-85
SLIDE 85

WHERE SHOULD THE WHERE SHOULD THE MODEL LIVE? MODEL LIVE?

Car Phone Cloud What qualities are relevant for the decision?

15 . 4

slide-86
SLIDE 86

Trigger initial discussion Speaker notes

slide-87
SLIDE 87

TELEMETRY DESIGN TELEMETRY DESIGN

How to evaluate mistakes in production?

slide-88
SLIDE 88

17-445 Soware Engineering for AI-Enabled Systems, Christian Kaestner

15 . 5

 