big data
play

BIG DATA CONFERENCE How to transform data into money using Big - PowerPoint PPT Presentation

APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE FIRST SPARK-BASED BIG DATA PLATFORM RELEASED After almost a decade developing Big Data projects in Paradigma, through its R+D department we


  1. APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies

  2. INTRO THE FIRST SPARK-BASED BIG DATA PLATFORM RELEASED After almost a decade developing Big Data projects in Paradigma, through its R+D department we were early adopters of Spark, which led to the creation of Stratio

  3. MY PROFILE SKILLS JORGE LOPEZ-MALLA After working with traditional processing methods, I started to do some R&S Big Data projects and I fell in love with the Big Data world. Currently i’m doing some awesome Big Data projects at Stratio

  4. MY PROFILE SKILLS ALBERTO RODRÍGUEZ DE LEM A After graduating I've been programming for more than 10 years. I’ve built high performance and scalable web applications for companies such as Indra Systems, Prudential and Springer Verlag Ltd. @ardlema

  5. STRATIO GO TO SPACE SPARK-BASED BD ENTERPRISE SPARK PLATFORM On – premise & cloud, our platform is The first Spark-Based big data geared towards helping companies platform released I I PURE SPARK OPEN-SOURCE SOLUTIONS The only pure Spark platform, Our enterprises solutions are the only global solution based on open source technologies

  6. OUR CLIENT M IDDLE EAST TELCO COM PANY o 9.500 mil. daily eventsprocessed o 9.2 mil. clients

  7. USE CASES

  8. USE CASES 1 M ANAGEM ENT & NORM ALIZATION OF DATA SOURCES

  9. USE CASES 1 M ANAGEM ENT & NORM ALIZATION OF DATA SOURCES

  10. USE CASES 2 NETWORK COVERAGE IM PROVEM ENT

  11. USE CASES 3 PEOPLE GATHERING

  12. USE CASES 3 PEOPLE GATHERING

  13. USE CASES 4 DATA M ONETIZATION

  14. USE CASES 4 DATA M ONETIZATION

  15. USE CASES 4 DATA M ONETIZATION

  16. TECHNICAL CHALLENGES

  17. TECHNICAL PROBLEMS 1 2 3 4 5 Huge volumen Huge size Distributed Hard Recognized of data of Data processing to read patterns

  18. 1 HUGE VOLUM E OF DATA SOLUTION APACHE HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

  19. 1 HUGE VOLUM E OF DATA 9500 mil. csv daily records-> circa 1 6 Gb Requirements: High availability Concurrent file reads

  20. 2 HUGE SIZE OF DATA SOLUTION APACHE PARQUET

  21. 2 HUGE SIZE OF DATA 1 6.5 Gb of daily event information stored as csv text in HDFS 4.3 Gb of daily event information stored as parquet files in HDFS STORE IM PROVEM ENT Circa 70 %

  22. 2 HUGE SIZE OF DATA Time to count daily csv events -> 6.2 minutes . Time to count daily Parquet events -> 1 minute READ PROCESS IM PROVEM ENT Circa 80%

  23. 3 DISTRIBUTED PROCESSING SOLUTION APACHE SPARK

  24. 3 DISTRIBUTED PROCESSING - REQUIREM EN TS Complex algorithmicswith the minimum amount of resources Reduction of the processtime in order to obtain data when it still isused

  25. 3 DISTRIBUTED PROCESSING - REQUIREMENTS Sharing the cluster with legacy processes Use of legacy outputs processeswithout doesany change

  26. 4 HARD TO READ SOLUTION SCALA + APACHE SPARK

  27. 4 HARD TO READ Reducing developing time LOCsdramatically reduced Number of classesdramatically reduced

  28. 4 HARD TO READ Testsand application readability improvements DSLsmake our liveseasier Spark makesMap Reduces jobseven simpler

  29. 5 RECOGNIZED PATTERNS SOLUTION APACHE SPARK M LLIB

  30. 5 RECOGNIZED PATTERNS Millonsof data processed in order to obtain mathematical models Applied complex mathematical algorithms to obtain accurate weekly behaviors

  31. THANK YOU UNITED STATES EUROPE Tel: (+1) 408 5998830 Tel: (+34) 91 828 64 73 contact@stratio.com www.stratio.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend