The Distributed Database Based on Kudu Shunda Lin Outline - PowerPoint PPT Presentation

Jan 28, 2023 •150 likes •384 views

The Distributed Database Based on Kudu Shunda Lin Outline Motivation Introduction of Kudu Deployment and Configuration Query Test Conclusion Outline Motivation Introduction of Kudu Deployment and Configuration

The Distributed Database Based on Kudu Shunda Lin
Outline • Motivation • Introduction of Kudu • Deployment and Configuration • Query Test • Conclusion
Outline • Motivation • Introduction of Kudu • Deployment and Configuration • Query Test • Conclusion
Motivation
Outline • Motivation • Introduction of Kudu • Deployment and Configuration • Query Test • Conclusion
Traditional System • The application system needs to reverse the data between the real - time and offline systems, and write a complex code. • Systems are complex, need various backups, security policies, and monitoring systems • There is a delay in the transformation from real-time system to offline system for OLAP analysis • It requires expensive price to change or rewirte the backward data when data in the past has been filed
Kudu-Fast Analytics on Fast Data • Released by Cloudera in 2015 • Used for OLAP • High performance for both data scanning and random access • Simplifying complex hybrid architectures
Architectures and Design • Super-fast Columnar Storage
Architectures and Design • Distribution and Fault Tolerance
Outline • Motivation • Introduction of Kudu • Deployment and Configuration • Query Test • Conclusion
Deployment • master ： slave2 （ 192.168.0.134 ） • tserver ： slave1 （ 192.168.0.135 ） slave2 (192.168.0.134) slave3 (192.168.0.100)
Data Persistence • MySQL->HDFS->Kudu • Sqoop a command-line interface application for transferring data between relational databases and Hadoop • Spark an open-source cluster-computing framework
MySQL to HDFS Sqoop import –connect jdbc:mysql://202.120.36.137:6033/mag-new-160205 –username=data – password=data –table AuthorFieldCount –m 1 –target-dir /user/hadoop/AuthorFieldCount –as-parquetfile
Data Persistence on Kudu • spark-shell • design table • create table • insert data
Outline • Motivation • Introduction of Kudu • Deployment and Configuration • Query Test • Conclusion
Query Test • 从领域相关表中提取出 1000 个与某领域最为相关的领域之间的相关关系 select FOSID as Source, FOSReferencesCount.FOSReference as Target, Similarity/10000000 as Weight from (select FOSReference from `FOSReferencesCount` where `FOSID` = '0271BC14' order by `Similarity` desc limit 1000) e1, (select FOSReference from `FOSReferencesCount` where `FOSID` = '0271BC14' order by `Similarity` desc limit 1000) e2, FOSReferencesCount where e1.`FOSReference` = `FOSReferencesCount`.FOSID and e2.`FOSReference` = `FOSReferencesCount`.FOSReference;
Computer Science Ethnic studies Data Structure FOSID (0271BC14) (03D2C4FF) (09ACCB7D) MySQL 82.4s 65.4s 55.7s Kudu 8.23s 9.175s 7.821s Query 90 80 70 60 50 40 30 20 10 0 Case1 Case2 Case3 MySQL Kudu
Query Test 180 160 140 120 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 MySQL Kudu
Outline • Motivation • Introduction of Kudu • Deployment and Configuration • Query Test • Conclusion
Q&A

Recommend

KUDU POWER PROJECT NAMIBIAS FLAG SHIP POWER GENERATION PROJECT 19 SEPTEMBER 2014

KUDU POWER PROJECT NAMIBIAS FLAG SHIP POWER GENERATION PROJECT 19 SEPTEMBER 2014 Background Kudu Gas Field was discovered in 1973 by Chevron, The first attempt to commercialize the Kudu gas resource for power generation was

243 views • 21 slides

15-721 ADVANCED DATABASE SYSTEMS Lecture #25 End of Semester + Impala/Kudu Tech Talk Andy

15-721 ADVANCED DATABASE SYSTEMS Lecture #25 End of Semester + Impala/Kudu Tech Talk Andy Pavlo / / Carnegie Mellon University / / Spring 2016 @Andy_Pavlo // Carnegie Mellon University // Spring 2017 2 TODAYS AGENDA

320 views • 9 slides

Distributed Databases Distributed database management system A distributed database (DDB) is

Distributed Databases Distributed database management system A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system

444 views • 10 slides

Distributed Databases 1 19.1 Distributed Database System A distributed database system

Distributed Databases 1 19.1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Database systems that run on each site are independent of each other

761 views • 42 slides

CS4224/CS5424 Lecture 1 Introduction Distributed Database Systems A distributed database is a

CS4224/CS5424 Lecture 1 Introduction Distributed Database Systems A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network A distributed database management system (DDBMS)

1.3k views • 37 slides

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on the Menu Bar DC/Win Database Utilities Options Vary Depending on Database Two Types of Databases Sequel (SQL) Database Access Database

460 views • 6 slides

Outline Introduction Background Distributed DBMS Architecture Distributed Database Design

Outline Introduction Background Distributed DBMS Architecture Distributed Database Design Distributed Query Processing Distributed Transaction Management Building Distributed Database Systems (RAID) Mobile Database Systems Privacy, Trust,

620 views • 20 slides

Distributed Databases Chapter 16 1 What is a Distributed Database? Database whose relations

Distributed Databases Chapter 16 1 What is a Distributed Database? Database whose relations reside on different sites Database some of whose relations are replicated at different sites Database whose relations are split between

551 views • 17 slides

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk What is an RDBMS? A PostgreSQL database is not just kept in a big file, like a spreadsheet. A program called the database server, or RDBMS,

323 views • 14 slides

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview Distributed Transactions Atomic Commit Protocol Distributed Deadlock Distributed Systems - Distributed Transactions 1 Distributed

1.12k views • 10 slides

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

DISTRIBUTED WORK TEAM #10 Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K | TEAM 10 DISTRIBUTED WORK TEAM #10 DISTRIBUTED WORK TEAM #10 DISTRIBUTED WORK TEAM #10 3 DISTRIBUTED WORK TEAM

470 views • 11 slides

National Address Database National Address Database What is a National Address Database?

White House Initiative Open Government Partnership National Action Plan National Address Database National Address Database What is a National Address Database? How does it work? Why do we need a National Address Database? What

374 views • 18 slides

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall 2020 University of Virginia 1 Levels of DB Security There are 6 levels that impact database security Database Level database users and

380 views • 18 slides

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

591 views • 21 slides

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017 AGENDA FOR TODAY The final project Advanced Mysql Database programming Recap: DB servers in the web Web programming architecture HTTP on a

845 views • 53 slides

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR TODAY The final project Advanced Mysql Database programming Recap: DB servers in the web Web programming architecture HTTP on a need-to-know basis.

688 views • 54 slides

Fee Setting at the USPTO Presenter: Anthony Scardino, Chief Financial Officer Presented to:

September Se ptember 20 2016 16 Fee Setting at the USPTO Presenter: Anthony Scardino, Chief Financial Officer Presented to: Nuclear Regulatory Commission What is the USPTO? Mission : Fostering innovation, competitiveness and economic

293 views • 13 slides

Search for PeV Gamma-Ray Point Sources with IceCube Zach Griffith and Hershal Pandya The IceCube

Search for PeV Gamma-Ray Point Sources with IceCube Zach Griffith and Hershal Pandya The IceCube Collaboration ICRC2017 19 July 2017 Busan, South Korea Motivation IceCube is the most sensitive southern hemisphere experiment to PeV gamma

362 views • 18 slides

TLUSTY TLUSTY p. 1 TLUSTY calculation of plane-parallel model stellar atmospheres ( T eff

TLUSTY TLUSTY p. 1 TLUSTY calculation of plane-parallel model stellar atmospheres ( T eff 10 000 K) and disk models radiative and hydrostatic equilibrium assumption of LTE or NLTE input for the SYNSPEC code Hubeny, I. 1988, Comp.

705 views • 22 slides

pan effects for slides in VBScript with ByteScout Image To Video SDK How To: tutorial on pan

pan effects for slides in VBScript with ByteScout Image To Video SDK How To: tutorial on pan effects for slides in VBScript Today you are going to learn how to pan effects for slides in VBScript. ByteScout Image To Video SDK was made to help with

163 views • 3 slides

Undergoers and goals: on measuring out and being affected Boban Arsenijevid, University of Ni 1

Workshop on Affectedness, NTU, June 17-20th Undergoers and goals: on measuring out and being affected Boban Arsenijevid, University of Ni 1 Goals of the talk Address some of the predictions and questions from the intro talk. Present

464 views • 32 slides

Community Consultative Committee update June 2016 Legal notice The following presentation

Central & Southern Regional Community Consultative Committee update June 2016 Legal notice The following presentation contains forward- looking statements concerning BG Group plcs strategy, operations, financial performance or condition,

237 views • 23 slides

Finite-State Registered Automata and their uses in Natural Languages Yael Cohen-Sygal and Shuly

Introduction Regular expression language Dedicated operators FSRT Implementation and evaluation Future plans Finite-State Registered Automata and their uses in Natural Languages Yael Cohen-Sygal and Shuly Wintner Department of Computer

398 views • 27 slides

with Tableau Avirup Chakraborty(MDS201908) Debangshu Bhattacharya(MDS201910) Ipsita

Big Data Visualization with Tableau Avirup Chakraborty(MDS201908) Debangshu Bhattacharya(MDS201910) Ipsita Ghosh(MDS201913) Swaraj Bose(MDS201936) Sreya K.K.(MDS201804) What is big data? Extremely large data sets that may be analyzed

908 views • 61 slides