Bench'19 Benchmarking Database Ingestion Ability with Real-Time Big - - PowerPoint PPT Presentation

bench 19
SMART_READER_LITE
LIVE PREVIEW

Bench'19 Benchmarking Database Ingestion Ability with Real-Time Big - - PowerPoint PPT Presentation

Bench'19 Benchmarking Database Ingestion Ability with Real-Time Big Astronomical Data Qing Tang Qing Tang,Chen Yang, Xiaofeng Meng, Zhihui Du RUC 15/11/2019 Outline Background Benchmark Methodology Experiments and Results Analysis


slide-1
SLIDE 1

Benchmarking Database Ingestion Ability with Real-Time Big Astronomical Data

Qing Tang

Qing Tang,Chen Yang, Xiaofeng Meng, Zhihui Du

RUC 15/11/2019

Bench'19

slide-2
SLIDE 2

Outline

ØBackground ØBenchmark Methodology ØExperiments and Results Analysis ØConclusion

slide-3
SLIDE 3

Real-time discovery of the transients Gamma-ray burats Supernova Evolution of sun-class stars AstroServer

Catalog流 微引力透镜 超新星 伽玛暴

microlensing

?

Accelerating scientific discovery

Mining the long-term regular pattern

1 Background

slide-4
SLIDE 4

1.1 Big Astronomy Data

  • GWAC(the ground-based wide-angle camera array)
  • Covering large field & high sampling frequency

Sky Survey Field (square degree) 5000 Sampling Frequence 15s

  • bservation stars

6.8 million generated data 2.5TB/day Service life 10 years Total data 3PB~6PB

slide-5
SLIDE 5

a) Quick response b) Massive storage of data c) Timeliness of data analysis d) High cost performance

1.2 Application characteristics

slide-6
SLIDE 6

Outline

ØBackground ØBenchmark Methodology ØExperiments and Results Analysis ØConclusion

slide-7
SLIDE 7

2 Benchmark Methodology

The specific methods are as follows: (1) According to the characteristics of data sets ,the corresponding workloads are analyzed in depth, and the frequent basic operating units are extracted; (2)The benchmark test specifications are determined; (3)The loads based on various software stacks are provided;

slide-8
SLIDE 8

Outline

ØBackground ØBenchmark Methodology ØExperiments and Results Analysis ØConclusion

slide-9
SLIDE 9

configuratio n Performance test environment Hardware software Master Memory:96GB Hard disk:3.5TB CPU:E5-2603 v3 @ 1.60GHz Ubuntu 14.04.5 Redis_3.2.5 HBase_1.2.4 MySQL_5.6.33 Kafka Slave Memory:96GB Hard disk:30TB CPU:E5-2603 v3 @ 1.60GHz Ubuntu 14.04.5 Redis_3.2.5 HBase_1.2.4 MySQL_5.6.33 Kafka

3.1 Experimental environment

Slaves

Master

Slaves Slaves Slaves

slide-10
SLIDE 10

Attribute Type Attribute Type redis_key string magcalibe double jd_str string sigma_base double ccdNum string sigma_ext double zone string tag_valid int starId long magdiff double alpha float lastCMtempname string delta float starBelong string pixx double abSignal string pixy double abVal double mag double abQuality double mage double abRank double thetaimage long sigma_ext_median double flags float mag_interval_num int ellipticity float sigmedthreshold double classstar float data11 double background float data12 double fwhm float data13 double vignet float data14 double magnorm double data15 double magcalib double

3.2 Test data Set

Data generator

1920 files 2.8TB

uOne time: 1920 files uOne file: 170,000 rows uOne row: 39 columns

slide-11
SLIDE 11

DataBase Persistence time Compression Rate Input anomaly rate Redis+HBase 4.8h 40% 2.50% Redis/HBase 6h 40% 4.60% Redis+MySQL 201h 100% 1.00% Redis/MySQL 202h 100% 1.00% Kafka+HBase 10.9 100% 2.50%

3.3 Results Analysis

DataBase Average storage time compare Selecttion HBase 340s > 15s No MySQL- cluster 1700s > 15s No Oracle 50.7s > 15s No Redis- cluster 6.4s < 15s Yes Kafka 20.5s > 15s

slide-12
SLIDE 12

Outline

ØBackground ØBenchmark Methodology ØExperiments and Results Analysis ØConclusion

slide-13
SLIDE 13

4 Conclusion

Data generater Cross matcher The cache manager Redis cluster Data persister Hbase Query engine

AstroServer

Catalog流 微引力透镜 超新星 伽玛暴

?

slide-14
SLIDE 14

Thank k You!

http://idke.ruc.edu.cn email: tangqing@ruc.edu.cn