The Distributed Database Based on Kudu Shunda Lin Outline - - PowerPoint PPT Presentation

the distributed database based on kudu
SMART_READER_LITE
LIVE PREVIEW

The Distributed Database Based on Kudu Shunda Lin Outline - - PowerPoint PPT Presentation

The Distributed Database Based on Kudu Shunda Lin Outline Motivation Introduction of Kudu Deployment and Configuration Query Test Conclusion Outline Motivation Introduction of Kudu Deployment and Configuration


slide-1
SLIDE 1

The Distributed Database Based on Kudu

Shunda Lin

slide-2
SLIDE 2

Outline

  • Motivation
  • Introduction of Kudu
  • Deployment and Configuration
  • Query Test
  • Conclusion
slide-3
SLIDE 3

Outline

  • Motivation
  • Introduction of Kudu
  • Deployment and Configuration
  • Query Test
  • Conclusion
slide-4
SLIDE 4

Motivation

slide-5
SLIDE 5

Outline

  • Motivation
  • Introduction of Kudu
  • Deployment and Configuration
  • Query Test
  • Conclusion
slide-6
SLIDE 6

Traditional System

  • The application system needs to reverse the data between the real
  • time and offline systems, and write a complex code.
  • Systems are complex, need various backups, security policies, and

monitoring systems

  • There is a delay in the transformation from real-time system to
  • ffline system for OLAP analysis
  • It requires expensive price to change or rewirte the backward data

when data in the past has been filed

slide-7
SLIDE 7

Kudu-Fast Analytics on Fast Data

  • Released by Cloudera in 2015
  • Used for OLAP
  • High performance for both data

scanning and random access

  • Simplifying complex hybrid architectures
slide-8
SLIDE 8
slide-9
SLIDE 9

Architectures and Design

  • Super-fast Columnar Storage
slide-10
SLIDE 10

Architectures and Design

  • Distribution and Fault Tolerance
slide-11
SLIDE 11

Outline

  • Motivation
  • Introduction of Kudu
  • Deployment and Configuration
  • Query Test
  • Conclusion
slide-12
SLIDE 12

Deployment

  • master:slave2(192.168.0.134)
  • tserver:slave1(192.168.0.135)

slave2 (192.168.0.134) slave3 (192.168.0.100)

slide-13
SLIDE 13

Data Persistence

  • MySQL->HDFS->Kudu
  • Sqoop

a command-line interface application for transferring data between relational databases and Hadoop

  • Spark

an open-source cluster-computing framework

slide-14
SLIDE 14

MySQL to HDFS

Sqoop import –connect jdbc:mysql://202.120.36.137:6033/mag-new-160205 –username=data – password=data –table AuthorFieldCount –m 1 –target-dir /user/hadoop/AuthorFieldCount –as-parquetfile

slide-15
SLIDE 15

Data Persistence on Kudu

  • spark-shell
  • design table
  • create table
  • insert data
slide-16
SLIDE 16
slide-17
SLIDE 17

Outline

  • Motivation
  • Introduction of Kudu
  • Deployment and Configuration
  • Query Test
  • Conclusion
slide-18
SLIDE 18

Query Test

  • 从领域相关表中提取出1000个与某领域最为相关的领域之间的相关关系

select FOSID as Source, FOSReferencesCount.FOSReference as Target, Similarity/10000000 as Weight from (select FOSReference from `FOSReferencesCount` where `FOSID` = '0271BC14'

  • rder by `Similarity` desc

limit 1000) e1, (select FOSReference from `FOSReferencesCount` where `FOSID` = '0271BC14'

  • rder by `Similarity` desc

limit 1000) e2, FOSReferencesCount where e1.`FOSReference` = `FOSReferencesCount`.FOSID and e2.`FOSReference` = `FOSReferencesCount`.FOSReference;

slide-19
SLIDE 19

10 20 30 40 50 60 70 80 90 Case1 Case2 Case3

Query

MySQL Kudu

FOSID Computer Science (0271BC14) Ethnic studies (03D2C4FF) Data Structure (09ACCB7D) MySQL 82.4s 65.4s 55.7s Kudu 8.23s 9.175s 7.821s

slide-20
SLIDE 20

20 40 60 80 100 120 140 160 180 1 2 3 4 5 6 7 8 9 10 11 12 13

Query Test

MySQL Kudu

slide-21
SLIDE 21

Outline

  • Motivation
  • Introduction of Kudu
  • Deployment and Configuration
  • Query Test
  • Conclusion
slide-22
SLIDE 22

Q&A