BIG DATA CONFERENCE How to transform data into money using Big - - PowerPoint PPT Presentation
BIG DATA CONFERENCE How to transform data into money using Big - - PowerPoint PPT Presentation
APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE FIRST SPARK-BASED BIG DATA PLATFORM RELEASED After almost a decade developing Big Data projects in Paradigma, through its R+D department we
After almost a decade developing Big Data projects in Paradigma, through its R+D department we were early adopters of Spark, which led to the creation of Stratio
THE FIRST SPARK-BASED BIG DATA PLATFORM RELEASED
INTRO
JORGE LOPEZ-MALLA
After working with traditional processing methods, I started to do some R&S Big Data projects and I fell in love with the Big Data
- world. Currently i’m doing some
awesome Big Data projects at Stratio
MY PROFILE SKILLS
ALBERTO RODRÍGUEZ DE LEM A
After graduating I've been programming for more than 10 years. I’ve built high performance and scalable web applications for companies such as Indra Systems, Prudential and Springer Verlag Ltd.
MY PROFILE
@ardlema
SKILLS
I I
GO TO SPACE
STRATIO
OPEN-SOURCE SOLUTIONS
Our enterprises solutions are based on open source technologies
PURE SPARK
The only pure Spark platform, the only global solution
ENTERPRISE SPARK
On – premise & cloud, our platform is geared towards helping companies
SPARK-BASED BD PLATFORM
The first Spark-Based big data platform released
OUR CLIENT
M IDDLE EAST TELCO COM PANY
- 9.500 mil. daily eventsprocessed
- 9.2 mil. clients
USE CASES
M ANAGEM ENT & NORM ALIZATION OF DATA SOURCES
USE CASES
1
USE CASES
M ANAGEM ENT & NORM ALIZATION OF DATA SOURCES
1
USE CASES
NETWORK COVERAGE IM PROVEM ENT
2
USE CASES
PEOPLE GATHERING
3
USE CASES
PEOPLE GATHERING
3
USE CASES
DATA M ONETIZATION
4
USE CASES
DATA M ONETIZATION
4
DATA M ONETIZATION
4
USE CASES
TECHNICAL CHALLENGES
TECHNICAL PROBLEMS
Huge volumen
- f data
Huge size
- f Data
Distributed processing Hard to read Recognized patterns
1 2 3 4 5
1 HUGE VOLUM E OF DATA
SOLUTION
APACHE HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
1 HUGE VOLUM E OF DATA
9500 mil. csv daily records-> circa 1 6 Gb Requirements: High availability Concurrent file reads
2 HUGE SIZE OF DATA
SOLUTION
APACHE PARQUET
2 HUGE SIZE OF DATA 1 6.5 Gb of daily event information stored as csv text in HDFS 4.3 Gb of daily event information stored as parquet files in HDFS
STORE IM PROVEM ENT Circa 70 %
2 HUGE SIZE OF DATA Time to count daily csv events -> 6.2 minutes
.
Time to count daily Parquet events -> 1 minute
READ PROCESS IM PROVEM ENT Circa 80%
3 DISTRIBUTED PROCESSING
SOLUTION
APACHE SPARK
3 DISTRIBUTED PROCESSING - REQUIREM EN TS
Complex algorithmicswith the minimum amount of resources Reduction of the processtime in order to obtain data when it still isused
3 DISTRIBUTED PROCESSING - REQUIREMENTS
Sharing the cluster with legacy processes Use of legacy outputs processeswithout doesany change
4 HARD TO READ
SOLUTION
SCALA + APACHE SPARK
4 HARD TO READ
Reducing developing time LOCsdramatically reduced Number of classesdramatically reduced
Testsand application readability improvements DSLsmake our liveseasier Spark makesMap Reduces jobseven simpler
4 HARD TO READ
5 RECOGNIZED PATTERNS
SOLUTION
APACHE SPARK M LLIB
Millonsof data processed in order to obtain mathematical models Applied complex mathematical algorithms to obtain accurate weekly behaviors
5 RECOGNIZED PATTERNS
THANK YOU
UNITED STATES
Tel: (+1) 408 5998830
EUROPE
Tel: (+34) 91 828 64 73
contact@stratio.com www.stratio.com