Prof. Chen Li 1 A real story Sue: Public Health Adam: Data - - PowerPoint PPT Presentation

prof chen li
SMART_READER_LITE
LIVE PREVIEW

Prof. Chen Li 1 A real story Sue: Public Health Adam: Data - - PowerPoint PPT Presentation

Data Analytics as a Service for Data Scientists Prof. Chen Li 1 A real story Sue: Public Health Adam: Data researcher Scientist 2 Challenges ... Infrastructure Data collection Large scale Machine learning Not enough IT


slide-1
SLIDE 1

Data Analytics as a Service for Data Scientists

1

  • Prof. Chen Li
slide-2
SLIDE 2

A real story

2

Sue: Public Health researcher Adam: Data Scientist

slide-3
SLIDE 3
  • Infrastructure
  • Data collection
  • Large scale
  • Machine learning

Not enough IT background!

Challenges ...

3

slide-4
SLIDE 4

Texera: Analytics using workflows Cloudberry: Big data visualization AsterixDB: parallel database

Software solutions

4

Users: researchers from UCI, UCLA

slide-5
SLIDE 5

Cloudberry: Big Data Visualization

5

slide-6
SLIDE 6

TwitterMap system

6

slide-7
SLIDE 7

Takes too much time?

7

slide-8
SLIDE 8

Fixed-length slicing?

8

slide-9
SLIDE 9

Query slicing with a rhythm

9

slide-10
SLIDE 10

Open challenges

  • Modeling DB for approximation viz
  • Visualizing large number of records
  • Integrating computing between middleware and frontend

10

slide-11
SLIDE 11

Texera: big data analytics using interactive workflows

11

slide-12
SLIDE 12

Actor Model

12

slide-13
SLIDE 13

Integrate ML Models

13

Included as

  • perators

UDF (feed) UDF (online) Data preparation for training Training Instances

slide-14
SLIDE 14

Labeled Instances Classifier Trainer

Conclusion: data analytics as a service

14

Cloudberry: Big data visualization AsterixDB: parallel database Texera: Analytics using workflows