possiblY Big data analytics for music data conchita control - - PowerPoint PPT Presentation

possibly
SMART_READER_LITE
LIVE PREVIEW

possiblY Big data analytics for music data conchita control - - PowerPoint PPT Presentation

possiblY Big data analytics for music data conchita control management song upload provider Artist Portal Portal Portal Central collection of payment records #artist Transparency of money Actionable insight label analyze artists,


slide-1
SLIDE 1

possiblY

Big data analytics for music data

slide-2
SLIDE 2

conchita

control management

slide-3
SLIDE 3

Artist song upload provider Portal Portal Portal Central collection of payment records #artist Transparency of money

slide-4
SLIDE 4

Actionable insight

slide-5
SLIDE 5
slide-6
SLIDE 6

label

analyze artists, predict next hit, control music platforms

slide-7
SLIDE 7

Revenue per country

TV campaign in UK?

slide-8
SLIDE 8

ann

feels cheated by management, orders audit

slide-9
SLIDE 9

10 TB

And 100GB / Month new Overwhelmed ...

slide-10
SLIDE 10

World view

slide-11
SLIDE 11

Portals with outliers

slide-12
SLIDE 12

empower artists through transparency.

big-data analytics

slide-13
SLIDE 13

Context of project

Develop a prototype Continuation later as FFG funded research project Integration

  • f ML to answer questions like: What do I need to

to to sell more music

slide-14
SLIDE 14

team

Anton Georg Nathaniel Max Constantin Philipp

slide-15
SLIDE 15

plausibility check

slide-16
SLIDE 16

#keyOutlierVisualization

slide-17
SLIDE 17

52,382

artists

slide-18
SLIDE 18

3,219

labels

slide-19
SLIDE 19

33

Portals, 8 portals with outliers (#14)

slide-20
SLIDE 20

Prototype data overview

slide-21
SLIDE 21

Anzahl outlier

In a real cluster compared to 17 minutes on a laptop

slide-22
SLIDE 22

Weighted repeated median smoothing and filtering

slide-23
SLIDE 23
slide-24
SLIDE 24

pipeline architecture

slide-25
SLIDE 25

data import batch spark-R Shiny/tableau

statistical prototype

data import batch & training real time model decisions presentation SPA

production prototype

slide-26
SLIDE 26

new event in queue real time model decision / prediction batch model improvement cached results presentation SPA

possible real production

slide-27
SLIDE 27

Frontend Angular2

slide-28
SLIDE 28

Frontend Angular2 Backend Spring-Boot / Camel

slide-29
SLIDE 29

Frontend Angular2 Backend Spring-Boot / Camel Data-science

slide-30
SLIDE 30

Frontend Angular2 Backend Spring-Boot / Camel Data-science Spark-job-server Spark cluster R algorithms

  • pencpu
slide-31
SLIDE 31

security

Top …

slide-32
SLIDE 32

15 sec

In a real cluster compared to 17 minutes on a laptop

slide-33
SLIDE 33

600 GB

Raw data compressed to 3 GB

slide-34
SLIDE 34

learnings

Learning a new programming language costs time but is fun Try to go monolith as long as possible Multiple API’s need good synchronization Good documentation

  • f API is key to parallelization

(mocking) Key failures involved not enough communication Artists do not earn much from streaming!

slide-35
SLIDE 35

Regarding architecture

nice UI(internal

  • nly): http://www.metabase.com/

https://github.com/airbnb/caravel Tableau + R for outlier Spark(thrift) + JDBC Change storage to fit structured data http://www.snappydata.io/

slide-36
SLIDE 36

possiblY

empower artists through transparency

slide-37
SLIDE 37

Validation of models

  • Testing with known/ generated data
  • Comparison of fit (manual)
slide-38
SLIDE 38

project specialties

  • Trade-off between production-grade

architecture and highly sophisticated statistical models (see different pipelines)

  • Prototype for FFG grant