Opportunities for Data-Management Research in the Era of Horizontal - PowerPoint PPT Presentation

Opportunities for Data-Management Research in the Era of Horizontal AI/ML Panelists: Theo Rekatsinas (UW Madison) Sudeepa Roy (Duke Univ.) Manasi Vartak (Verta.AI) Ce Zhang (ETH Zurich) Moderator: Alkis Polyzotis (Google Research)

Starting points ML is blooming as a field Rapid innovation and impact in research and industry ● Growing base of researchers and practitioners ● It’s now harder to get a NeurIPS registration than a ticket to Hamilton :-) ●

Starting points ML is blooming as a field Rapid innovation and impact in research and industry ● Growing base of researchers and practitioners ● It’s now harder to get a NeurIPS registration than a ticket to Hamilton :-) ● There is a strong link between ML and data management Data is the fuel for ML ⇒ Data management in the context of ML ● ML training/serving is a data flow ⇒ Optimizations from DB systems ● ML can crack hard problems ⇒ ML-driven DB system optimizations ●

Starting points ML is blooming as a field Rapid innovation and impact in research and industry ● Growing base of researchers and practitioners ● It’s now harder to get a NeurIPS registration than a ticket to Hamilton :-) ● There is a strong link between ML and data management Data is crucial for ML ⇒ Data management in the context of ML ● ML training/serving is a data flow ⇒ Optimizations from DB systems ● ML can crack hard problems ⇒ ML-driven DB system optimizations ● Good news for everyone in this room!

ML is becoming horizontal

ML is becoming horizontal ML applies to more domains of increasing diversity Medical diagnosis, farming, chip design, transportation, astronomy, ... ●

ML is becoming horizontal ML applies to more domains of increasing diversity Medical diagnosis, farming, chip design, transportation, astronomy, ... ● Integration of ML in the stack is becoming wider and deeper Servers vs phones, machine-learned modules, hardware innovations... ●

ML is becoming horizontal ML applies to more domains of increasing diversity Medical diagnosis, farming, chip design, transportation, astronomy, ... ● Integration of ML in the stack is becoming wider and deeper Servers vs phones, machine-learned modules, hardware innovations... ● More users, of varying skill sets, are relying on ML Engineers, analysts, scientists, ... ●

ML is becoming horizontal ML applies to more domains of increasing diversity Medical diagnosis, farming, chip design, transportation, astronomy, ... ● Integration of ML in the stack is becoming wider and deeper Servers vs phones, machine-learned modules, hardware innovations... ● More users, of varying skill sets, are relying on ML Engineers, analysts, scientists, ... ● What does this expansion imply for data management? ⇐ This panel!

Panel Structure Question 1: Research opportunities (or, the good news!) Question 2: How do we publicize our research? Question 3: How do we train our students? For each question: Panelists make their case (audience: hold your fire!) ● Open discussion (audience participation strongly encouraged) ● Next question ●

Panelists Theo Rekatsinas Sudeepa Roy Manasi Vartak Ce Zhang UW Madison Duke Univ. Verta.AI ETH Zurich “As a teenager I used to "My other current research is “My company’s name is not "I am trying to cycle around on learning new nursery based on my last name, just every single non-trivial lake juggle devil sticks. My first rhymes for my 18 months a need for available domain in Switzerland, and I am set was a gift from a old daughter." names ;) and also `ver=true`” almost 40% done." psychiatrist.”

Research opportunities

Are we seeing the whole picture?

Let’s see where AI is headed next

“What is THE most exciting challenge for AI (and Data Management)?” Exploding data combined with shrinking time to act

Sudeepa

DM + ML/AI research opportunities DM-4-ML ML-4-DM • Systems for ML • Faster inference • Learning index, schema, • Pushing ML through a query plan query optimization, access patterns • Curation and optimization of ML • Cardinality estimation pipeline • Approximate Query Processing • Automated training data generation • Regret-bounded query processing • Hardware for ML • … . • Distributed ML We will talk about these anyway! :-) • Linear algebra based analytics • … .

My thoughts on research opportunities 1. Based on my research experience 2. From ML researchers’ experience

My thoughts on research opportunities 1. Based on my research experience Relatively recent but interesting research using ML/AI e.g., “Using regression to explain outliers” or “Learning to sample” Interpretability/Explanations and Causality

Interpretability and Explanations Input Data Algorithm or Query Output(s) D Q Q[D] “Why do I see this output?” How do we interpret “Why do I see an outlier?” and understand “Why is one value higher than the other?” “Why is input-A classified as Type-B?” the output? “Why is sales in Jan predicted to be higher?”

Why Interpretability? Ethics Accountability Actions Transparency Debugging Maintainability Fairness SIGMOD’19 Keynote by Lise Getoor on “ Responsible Data Science ” SIGMOD’19 Panel on “ Data Ethics ” Courtesy: Lise Getoor and SIGMOD’19 twitter account

How do we interpret “Why do I see this output?” and understand “Why do I see an outlier?” the output? “Why is one value higher than the other?” “Why is input-A classified as Type-B?” “Why is sales in Jan predicted to be higher?” Tracking “provenance” may not be enough What are the main factors resulting in this prediction/classification/outlier? How do we explain them to an analyst, decision maker, or scientist who does not hold an advanced degree in CS?

Ideally, “Why” = Find the “Cause” Causes! What are the main factors resulting in this prediction/classification/outlier? David Hume Karl Pearson Aristotle Carl Gustav Hempel Judea Pearl (1738) (1911) (384-322 BC) (1965) Causality A Treatise of Human Nature Graphical Models The Grammar of Science Metaphysics Aspects of Scientific Explanation and Other Essays Beyond interpretability: Causality has broader applications in sound “prescriptive” data analysis! Helping decide whether or not a data-driven decision is wise

Correlation is not causation! How much ● “Does smoking cause lung cancer?” ● “Does drug A cure disease B?” ● “Does increasing tax on cigarettes reduce lung problems?” ● “Does a reduction in interests encourage people to buy houses?” ● “Does an increased icecream sale increase crime rate?” We cannot increase tax on icecream sales to stop crime! * Both increase during summer Going only by prediction or learning models for data-driven decisions, the effect can be disastrous Need to measure causality

Controlled experiment 32

Controlled experiment At random Compute average and take difference Randomization is crucial Drug (treatment) Placebo (control) to estimate causal effect without bias 33

What if we cannot do randomized controlled experiments? Due to ethical, time, or cost constraints ● “ Does smoking cause lung cancer ?” ● “ Does growing up in a poor neighborhood make a child earn less as an adult ?” Fortunately, we can do “Observational Causal Studies” Under certain assumptions Donald Rubin Harvard Statistics Potential Outcome Framework for Causality

Observational Causal Study (+ DM) Find “units” (e.g. patients) who look similar (called “matching”) ○ E.g., of same age, gender, height, ethnicity, … SQL Group-By ○ “Confounding covariates” Many tools are available But for small, simple data With large data, SQL wins by a margin!

4 Lines of SQL ⇒ Our two collaborative projects on causality and ML/AI! DM-4-ML/AI Cynthia Rudin Alexander Volfovsky Lise Getoor Babak Salimi Dan Suciu Duke CS Duke Statistics UCSC UW • Fast matching methods for large data • Causal analysis on large complex data using DM and ML techniques • Causal discovery • with applications in health data • Automatic assessment of key assumptions e.g., Stopping flu-spread in college dorms (with UNC Global Health) New insights in data analysis or DM problems SIGMOD’19 best paper by ML-4-DM Salimi et al. on fairness by causality!

My thoughts on research opportunities DM-4-ML/AI ML-4-DM 2. From ML researchers’ experience Sometimes running batch Do they face any data related problems? scripts work for large data! Which problems they would like to solve?

Some challenges faced in ML: 1/2 ● Real-time systems and easy data flow and tensor flows ○ e.g., real-time neural network with frequent updates ● Infrastructure to work with Electronic Health Record and Medical Data ○ Privacy, updates, dataflow ● Efficient pre-processing in NLP ○ e.g., Find word-tuples appearing frequently and prune by some measures ● Image databases and image retrieval ○ Use the high level image structure (scene, objects, people, their spatial relation) , and find images whose structure satisfies some property?

Opportunities for Data-Management Research in the Era of Horizontal - PowerPoint PPT Presentation

Opportunities for Data-Management Research in the Era of Horizontal AI/ML Panelists: Theo Rekatsinas (UW Madison) Sudeepa Roy (Duke Univ.) Manasi Vartak (Verta.AI) Ce Zhang (ETH Zurich) Moderator: Alkis Polyzotis (Google Research) Starting

ERA 1 ERA I I ( i) Deakin and Faculty of Bus. & Law Response to ERA I ( ii)

E RA- MIN 2 Sta rting De c 1 st 2016 2 About ERA MIN 2 ERA MIN 2 is an ERA NET

Reactive Systems Why now? Electronic Commerce Era Multicore Era Cloud Era Backlash to the BOFH

FLAG-ERA Presentation FLAG-ERA JTC 2017 Project Kick-off Seminar March 21-22, 2018 Edouard

FASHION THE VICTORIAN ERA & THE CORSET THE VICTORIAN ERA & THE CORSET THE VICTORIAN

RTMS PRES TMS PRESENT ENTATI TION ON FR FROM OM NEW ERA C NEW ERA COM OMMER ERCE CE

Waste Management Intelligent Systems and Policies (WIN-POL) E: info@era.org.mt W: era.org.mt

Specific context: Climate reanalysis The ERA-CLIM and ERA-CLIM2 projects CERA: a system for

25/09/2013 1 25/09/2013 2 INDEX Elevated Horizon An Era of Opportunities Challenges

FLAG-ERA Activities FET Flagship Board of Funders (BoF) meeting September 17 th , 2015 Edouard

Predictive GCE in the era of large surveys: challenges and opportunities Brian OShea, MSU

Value Added Opportunities with Value Added Opportunities with Value Added Opportunities with

Opportunities Naming Opportunities thirty-seven opportunities, in seven funding levels levels

Introduction to Data Management Konstantin Tretyakov http://kt.era.ee AACIMP Summer School 2014,

U.S.D.A. Rural Development Single Family Housing Guaranteed Loan Program New Era EFC New Era

Era I Unit 1 WHI.I World History and Geography Basics Outline Form / Global Features/ Geography

From Science to Policy: Setting the Table Barry M. Lester, Ph.D. Center for the Study of

FISSEA Security Awareness, Training, and Education Contest Categories Judges Website Not

2016 Homeless Needs Assessment City of Seattle Human Services Department More than 1,050 unique

Drug-Testing Requirements for Companies An Overview of the SAMHSA 5 Commonly-Abused Drugs

digital technology: What works for whom, how and why? Dr Adrienne O'Neil Senior Research Fellow

SMART CITIES Conference Social Manufacturing - from thoughts to products and services Prof. Dr.

1 1/27/2012 '* :~ II -- ponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA ~ ,.. ~,., '-~"

IPv6 Activities in China 2005.02.23 Hualin Qian Chinese Academy of Sciences Governments

Sambuz

Useful Links

Newsletter

Mail Us

Opportunities for Data-Management Research in the Era of Horizontal - PowerPoint PPT Presentation

Opportunities for Data-Management Research in the Era of Horizontal AI/ML Panelists: Theo Rekatsinas (UW Madison) Sudeepa Roy (Duke Univ.) Manasi Vartak (Verta.AI) Ce Zhang (ETH Zurich) Moderator: Alkis Polyzotis (Google Research) Starting

ERA 1 ERA I I ( i) Deakin and Faculty of Bus. &amp; Law Response to ERA I ( ii)

E RA- MIN 2 Sta rting De c 1 st 2016 2 About ERA MIN 2 ERA MIN 2 is an ERA NET

Reactive Systems Why now? Electronic Commerce Era Multicore Era Cloud Era Backlash to the BOFH

FLAG-ERA Presentation FLAG-ERA JTC 2017 Project Kick-off Seminar March 21-22, 2018 Edouard

FASHION THE VICTORIAN ERA &amp; THE CORSET THE VICTORIAN ERA &amp; THE CORSET THE VICTORIAN

RTMS PRES TMS PRESENT ENTATI TION ON FR FROM OM NEW ERA C NEW ERA COM OMMER ERCE CE

Waste Management Intelligent Systems and Policies (WIN-POL) E: info@era.org.mt W: era.org.mt

Specific context: Climate reanalysis The ERA-CLIM and ERA-CLIM2 projects CERA: a system for

25/09/2013 1 25/09/2013 2 INDEX Elevated Horizon An Era of Opportunities Challenges

FLAG-ERA Activities FET Flagship Board of Funders (BoF) meeting September 17 th , 2015 Edouard

Predictive GCE in the era of large surveys: challenges and opportunities Brian OShea, MSU

Value Added Opportunities with Value Added Opportunities with Value Added Opportunities with

Opportunities Naming Opportunities thirty-seven opportunities, in seven funding levels levels

Introduction to Data Management Konstantin Tretyakov http://kt.era.ee AACIMP Summer School 2014,

U.S.D.A. Rural Development Single Family Housing Guaranteed Loan Program New Era EFC New Era

Era I Unit 1 WHI.I World History and Geography Basics Outline Form / Global Features/ Geography

From Science to Policy: Setting the Table Barry M. Lester, Ph.D. Center for the Study of

FISSEA Security Awareness, Training, and Education Contest Categories Judges Website Not

2016 Homeless Needs Assessment City of Seattle Human Services Department More than 1,050 unique

Drug-Testing Requirements for Companies An Overview of the SAMHSA 5 Commonly-Abused Drugs

digital technology: What works for whom, how and why? Dr Adrienne O'Neil Senior Research Fellow

SMART CITIES Conference Social Manufacturing - from thoughts to products and services Prof. Dr.

1 1/27/2012 '* :~ II -- ponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA ~ ,.. ~,., '-~&quot;

IPv6 Activities in China 2005.02.23 Hualin Qian Chinese Academy of Sciences Governments

Sambuz

Useful Links

Newsletter

Mail Us

ERA 1 ERA I I ( i) Deakin and Faculty of Bus. & Law Response to ERA I ( ii)

FASHION THE VICTORIAN ERA & THE CORSET THE VICTORIAN ERA & THE CORSET THE VICTORIAN

1 1/27/2012 '* :~ II -- ponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA ~ ,.. ~,., '-~"