DBMS support for deep learning over image data Parmita Mehta, - PowerPoint PPT Presentation

DBMS support for deep learning over image data Parmita Mehta, Magdalena Balazinska, Andrew Connolly, and Ariel Rokem University of Washington

Modern Data Management Requirements ● Manage image and video data ● Build complex machine learning models Neuroscience: Data from the Human Ophthalmology Astronomy: Connectome project 1. Image processing 1. Classification 1. Data cleaning 2. Denoising 2. Segmentation 2. Object extraction 3. Model fitting 3. Clustering 3. Classification Picture from Deep Lens Survey (DLS: Tyson) Picture from Prof. Aaron Lee Consumer data: 1. Object detection 2. Classification 3. Description Picture from Google image search

Use case : Optical coherence tomography (OCT) OCT uses light waves to take cross-section pictures of retina to diagnose: macular hole, pucker, and edema ● ● age-related macular degeneration ● central serous retinopathy ● diabetic retinopathy We got some good results https://ai.googleblog.com/2016/08/improving-inception-and-image.html

Model Building is a Messy Process 1. Different versions of the data with different metadata 2. Choose data and prepare it (e.g., crop it) 3. Build a model, train it, and evaluate it on development subset of data 4. Try to figure out why results terrible 5. Clean data, re-organize data, enhance data 6. Think of a new model and go back to step 3 7. Now compare the various models 8. Keep track of data subsets, models, model parameters, etc. 9. Maybe one day finally write the paper 10.And then when revision request comes back, try to remember all above

Key Challenges ● Large data volumes ● Slowness of lifecycle: train/test/change ● Cognitive burden of keeping track of data and models ● Correctness - don’t use test set to tune the model Not seeking to replace ML libraries! But extend them with data management capabilities

Our Approach: ODIN DB

ODIN Architecture Extend RDBMS with constructs to easily Python SQL ... express tasks associated with model building and API: DSL debugging Query Physical Parallel Optimizer Tuner Execution Not seeking to replace ML libraries! But extend them with data management Relational Engine capabilities

ODIN Prototype Python SQL API: DSL Query Physical Optimizer Tuner VDMS is a new system from Intel, designed specifically to store and query Visual Data Extended image databases Management System Storage (VDMS) * Layer https://github.com/IntelLabs/vdms/wiki

Our Data Model and Domain Specific Language ● Image ID Insert / Delete / Update ● ● Image ( as blob) Select (e.g. create training set) ● Images ● Label Crop, Rotate, Blur, Resize ... ● ● Meta-data (e.g. age, patientID etc.) ● Model ID Insert / Delete / Update ● ● Name Models Select ● ● Definition (JSON) ● Meta-data(e.g. # of classes, type etc.) ● Experiment ID Insert / Delete / Update ● ● Model ID Select Experiments ● ● Data Sets (test set, training set etc) Generate Maximized Image ● ● Results (accuracy, F1, recall etc) ● Meta-data (epochs, learning rate, etc.) Per ● Experiment ID Generate / Delete ● ● Image ID Select Image ● ● Activation for all neurons Generate Attribution for Image ID(s) ● Parameters ● Predicted class

Example Database ● Image ID ● Image ( as blob) Images: OCT_Images ● Label ● Meta-data (e.g. age, patientID etc.) Image- Label Slice Patient Age G Visual Diag Image ID -ID -ID Acuity b06e7bfc444c ERM 26 b06e7bfc444c 52.28 1 0.48 [1 , 0 , 0 , 0] 93db26a7c6a 93db26a7c6a5 5d4d234- d4d234 00033918- 026.png 6cc38578fc7f AMD 29 6cc38578fc7f2 90.05 1 0.7 [0 , 1 , 0 , 0] 24f21519d14f 4f21519d14f77 776d4c- 6d4c 00168131- 029.png

Example Database ● Model ID ● Name Models : OCT_Models ● Definition (JSON) ● Meta-data(e.g. # of classes, type etc.) Model- Name Definition Classes Type Input Number ID of Params 1 VGG-16-BN JSON 4 Multi-class (256,256) 134,276,034 2 Inception-V3 JSON 4 Multi-label (299,299) 24,348,324

Example Database ● Experiment ID ● Model ID Experiments : OCT_Experiments ● Data Sets (test set, training set etc) ● Results (accuracy, F1, recall etc) ● Meta-data (epochs, learning rate, etc.) Experiment- Model- Train Test Acc Epochs Initial- ID ID LR 1 1 retina-train2 retina-test2 78.8 50 1e-3 25 1 retina-train2 retina-test2 90.05 150 1e-4

Example Database Per Image Parameters : OCT_LIP ● Experiment ID ● Image ID ● Activation for all neurons ● Predicted class Experiment- Image-ID Activation Predicted ID class 25 b06e7bfc444c93db26a7c6a5d4d234-00033918- JSON 2 026.png 25 6cc38578fc7f24f21519d14f776d4c-00168131- JSON 3 029.png

Queries Easy 1. Basic queries a. Select images/models/experiments based on metadata b. Execute user-defined code on any of the data (e.g., train model) 2. Model-debugging queries Slow and hard to express a. What is the model learning? b. What are representative images that classifier gets wrong? 3. Model comparison queries a. Why is this model better? What are the models learning differently? 4. Data inspection queries a. What are the important features in my data?

Research Questions 1. Materialization vs Re-processing: a. Storing intermediates requires tens to hundreds of GB of storage b. Re-running model for each diagnostic query is slow c. What are the trade-offs for materialization vs regeneration? d. How best to compress the materialized data? 2. Expressivity: a. How best to extend relational model to express queries easily? 3. Extensibility: a. This is an active research area, how to build extensibility into the system to allow new operations and classes of machine learning?

Conclusion ● Images and videos are common data types today ● Workloads primarily focus on machine learning / deep learning ● Database management systems provide limited to no support ● ODIN DB is a new DBMS that extends relational systems with

DBMS support for deep learning over image data Parmita Mehta, - PowerPoint PPT Presentation

DBMS support for deep learning over image data Parmita Mehta, Magdalena Balazinska, Andrew Connolly, and Ariel Rokem University of Washington Modern Data Management Requirements Manage image and video data Build complex machine

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Design of Flash- -Based DBMS: Based DBMS: Design of Flash Design of Flash-Based DBMS: An In-

CS743 - Principles of Database Management and Use Distribution, Replication, and CAP Ken Salem

DBMS + ML Julian Oks Josh Sennett Jan. 29, 2020 Context + Problem Statement Context: DBMS + ML

Alert: An Architecture for Transforming a Passive DBMS into an Active DBMS Ulf Schreier, Hamid

Tactical data engineering Julian Hyde April 1718, 2019 San Francisco @julianhyde DBMS Data

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Distributed DBMS reliability Distributed DBMS reliability

Database Management System (DBMS) DBMS contains information about a particular enterprise

Database Management Systems (DBMS) Prof. Pfaff. Lafayette College February 19, 2018 Prof.

Architecture of DBMS Mrs. Maninder Kaur professormaninder@gmail.com Mrs. Maninder Kaur

DRY-SAS/DBMS UPDATE Executive Committee meeting 9 OCTOBER 2020 BACKGROUND DRY-SAS AND DBMS

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

Deep Image-Text Embeddings Learning Deep Structure-Preserving Image-Text Embeddings (CVPR 2016)

EHR Incentives for Professionals and Hospitals Paul Forlenza, VP Policy, VITL updated October 1,

RIGHT PATIENT, RIGHT TREATMENT, RIGHT TIME: Utilizing Early, High-Efficacy Therapies to Improve

Pearls, Pitfalls and Advances in Neuro-Ophthalmology Nancy J. Newman, MD Emory University

Lecture 31/Chapter 25 More about Meta-Analysis Benefits and Pitfalls An Application:

Keshab K. Parhi: Patents, Books, Journal and Conference Publications, and Book Chapters Summary :

Biovigilance Component Hemovigilance Module Adverse Reaction and Denominator Reporting National

Signal Processing for Medical Applications Frequency Domain Analyses Muthuraman Muthuraman

Introduction to Astronomical Introduction to Astronomical Imaging Systems Imaging Systems 1