BIG DATA CONFERENCE How to transform data into money using Big - - PowerPoint PPT Presentation

▶

Jul 01, 2023 292 likes •641 views

APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE FIRST SPARK-BASED BIG DATA PLATFORM RELEASED After almost a decade developing Big Data projects in Paradigma, through its R+D department we

SLIDE 1

APACHE BIG DATA CONFERENCE

How to transform data into money using Big Data technologies

SLIDE 2

After almost a decade developing Big Data projects in Paradigma, through its R+D department we were early adopters of Spark, which led to the creation of Stratio

THE FIRST SPARK-BASED BIG DATA PLATFORM RELEASED

INTRO

SLIDE 3

JORGE LOPEZ-MALLA

After working with traditional processing methods, I started to do some R&S Big Data projects and I fell in love with the Big Data

world. Currently i’m doing some

awesome Big Data projects at Stratio

MY PROFILE SKILLS

SLIDE 4

ALBERTO RODRÍGUEZ DE LEM A

After graduating I've been programming for more than 10 years. I’ve built high performance and scalable web applications for companies such as Indra Systems, Prudential and Springer Verlag Ltd.

MY PROFILE

@ardlema

SKILLS

SLIDE 5

I I

GO TO SPACE

STRATIO

OPEN-SOURCE SOLUTIONS

Our enterprises solutions are based on open source technologies

PURE SPARK

The only pure Spark platform, the only global solution

ENTERPRISE SPARK

On – premise & cloud, our platform is geared towards helping companies

SPARK-BASED BD PLATFORM

The first Spark-Based big data platform released

SLIDE 6

OUR CLIENT

M IDDLE EAST TELCO COM PANY

9.500 mil. daily eventsprocessed
9.2 mil. clients

SLIDE 7

USE CASES

SLIDE 8

M ANAGEM ENT & NORM ALIZATION OF DATA SOURCES

USE CASES

1

SLIDE 9

USE CASES

M ANAGEM ENT & NORM ALIZATION OF DATA SOURCES

1

SLIDE 10

USE CASES

NETWORK COVERAGE IM PROVEM ENT

2

SLIDE 11

USE CASES

PEOPLE GATHERING

3

SLIDE 12

USE CASES

PEOPLE GATHERING

3

SLIDE 13

USE CASES

DATA M ONETIZATION

4

SLIDE 14

USE CASES

DATA M ONETIZATION

4

SLIDE 15

SLIDE 16

SLIDE 17

SLIDE 18

DATA M ONETIZATION

4

USE CASES

SLIDE 19

TECHNICAL CHALLENGES

SLIDE 20

TECHNICAL PROBLEMS

Huge volumen

f data

Huge size

f Data

Distributed processing Hard to read Recognized patterns

1 2 3 4 5

SLIDE 21

1 HUGE VOLUM E OF DATA

SOLUTION

APACHE HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

SLIDE 22

1 HUGE VOLUM E OF DATA

9500 mil. csv daily records-> circa 1 6 Gb Requirements: High availability Concurrent file reads

SLIDE 23

2 HUGE SIZE OF DATA

SOLUTION

APACHE PARQUET

SLIDE 24

2 HUGE SIZE OF DATA 1 6.5 Gb of daily event information stored as csv text in HDFS 4.3 Gb of daily event information stored as parquet files in HDFS

STORE IM PROVEM ENT Circa 70 %

SLIDE 25

2 HUGE SIZE OF DATA Time to count daily csv events -> 6.2 minutes

.

Time to count daily Parquet events -> 1 minute

READ PROCESS IM PROVEM ENT Circa 80%

SLIDE 26

3 DISTRIBUTED PROCESSING

SOLUTION

APACHE SPARK

SLIDE 27

3 DISTRIBUTED PROCESSING - REQUIREM EN TS

Complex algorithmicswith the minimum amount of resources Reduction of the processtime in order to obtain data when it still isused

SLIDE 28

3 DISTRIBUTED PROCESSING - REQUIREMENTS

Sharing the cluster with legacy processes Use of legacy outputs processeswithout doesany change

SLIDE 29

4 HARD TO READ

SOLUTION

SCALA + APACHE SPARK

SLIDE 30

4 HARD TO READ

Reducing developing time LOCsdramatically reduced Number of classesdramatically reduced

SLIDE 31

Testsand application readability improvements DSLsmake our liveseasier Spark makesMap Reduces jobseven simpler

4 HARD TO READ

SLIDE 32

5 RECOGNIZED PATTERNS

SOLUTION

APACHE SPARK M LLIB

SLIDE 33

Millonsof data processed in order to obtain mathematical models Applied complex mathematical algorithms to obtain accurate weekly behaviors

5 RECOGNIZED PATTERNS

SLIDE 34

THANK YOU

UNITED STATES

Tel: (+1) 408 5998830

EUROPE

Tel: (+34) 91 828 64 73

contact@stratio.com www.stratio.com