Bug bites Elephant? T est-driven Quality Assurance in Big Data - - PowerPoint PPT Presentation

bug bites elephant
SMART_READER_LITE
LIVE PREVIEW

Bug bites Elephant? T est-driven Quality Assurance in Big Data - - PowerPoint PPT Presentation

Bug bites Elephant? T est-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin Buzzwords Who speaks the Elephant language? Class A ? TDD! extends Mapper ? ? ROI, $$, ?


slide-1
SLIDE 1

Bug bites Elephant?

T est-driven Quality Assurance in Big Data Application Development

  • Dr. Dominik Benz, Inovex GmbH

2013/06/03, Berlin Buzzwords

slide-2
SLIDE 2

? TDD! ? ? ? ? ? ? ? Write/execute tests, specify acceptance criteria, …

2

Who speaks… … the Elephant language?

Class A extends Mapper… ROI, $$, … apt-get install…

slide-3
SLIDE 3

3

The road… … to Big Data QA

  • ur Big Data

QA problem the FitNesse approach test data definition / selection job & workflow control result inspection

slide-4
SLIDE 4

4

QA problem

Web Intelligence @ 1&1

DWH Hadoop Cluster

~ 1 billion log events / day, ~ 1 TB (thrift) logfiles chains of MR jobs, running on 20 nodes / 8 cores / 96 GB RAM (CDH) BI reporting, web analytics, …

slide-5
SLIDE 5

5

QA problem

An exemplary workflow

Log Files (thrift) Log Files (thrift) Log Files (thrift) Inter- mediat e result (avro) MR job 1 …

DWH (RDBMS )

MR job 2 create (sample) input data

?

inspect (binary ) format s

?

control workflow s

?

slide-6
SLIDE 6

metho d tests what? issues for our usecase

JUnit isolated functions no integration, Java syntax MRUnit 1 mapper + 1 reducer „little“ integration, Java syntax iTest hadoop jobs/workflows Java / Groovy syntax Scripts/ CLI (manual) scripting/inspect. „script chaos“, syntax

6

QA problem

Existing Approaches  FitNesse as suitable addition / solution!

slide-7
SLIDE 7

7

The road… … to Big Data QA

Big Data QA is different! the FitNesse approach test data definition / selection job & workflow control result inspection

slide-8
SLIDE 8

8

FitNesse

In a nutshell

„fully integrated standalone wiki and acceptance testing framework” „executable“ Wiki- Pages (returning test results) (almost) natural language test specification connection to SUT via (Java-)“Fixtures“

slide-9
SLIDE 9

9

FitNesse

Architecture Overview

script | check | num results | 3 |

Brows er

FitNesse Server public int numResults { ... } System under Test

Fixtur es

„calling java methods from wiki“, compare return values Integrates with REST, Jenkins…

slide-10
SLIDE 10

10

FitNesse

An Exemplary T est

slide-11
SLIDE 11

11

FitNesse

Exemplary T est Source

!path /home/inovex/lib/*.jar | script | Hadoop | | upload | viewLog.csv | to hdfs | /testdata/ | | hadoop job from jar | viewLog.jar | [...] | | show | job output | | check | number of output files | 3 |

slide-12
SLIDE 12

12

FitNesse Hadoop Fixture Java Code

public class Hadoop { public boolean uploadToHdfs(String localFile, String remoteFile) {...} public boolean hadoopJobFromJar(String jar, String input, String output) {...} public String jobOutput() {...} public String numberOfOutputFiles() {...} }

slide-13
SLIDE 13

13

The road… … to Big Data QA

Big Data QA is different! Fitnesse Wiki test execution! test data definition / selection job & workflow control result inspection

slide-14
SLIDE 14

14

T est Data

CSV

slide-15
SLIDE 15
  • Big Data: Efficient data transfer among

heterogeneous sources

  • Define Interface via IDL, Compiler for many

languages

15

T est Data

Thrift

slide-16
SLIDE 16
  • Dev/T

est Hadoop Cluster: Identical Hardware like Prod, but fewer nodes

  • (random/biased) sampling e.g. on daily basis
  • Feedback loop:
  • identify „special cases“ from real data
  • include them in (manual) data definition
  • Gradually increase test coverage / artefact quality

16

T est Data

Real World Data

slide-17
SLIDE 17

17

The road… … to Big Data QA

Big Data QA is different! FitNesse Wiki test execution! Define CSV / thrift / real- world test data! job & workflow control result inspection

slide-18
SLIDE 18
  • Execute arbitrary (shell) commands
  • Mainly a wrapper around

apache.commons.exec.CommandLine

18

Job Control

Swiss Army Knife: Shell

slide-19
SLIDE 19
  • Hide complexity from test authors
  • „define“ appropriate test language via (Java) method names
  • re-use other fixtures (Shell, …) internally

19

Job Control

Hadoop Fixture

slide-20
SLIDE 20
  • FitNesse allows to group tests into suites
  • Can be used to simulate MR processing

chains

  • SetupSuite / T

earDownSuite for creating / destroying test conditions

  • T

ests can still be executed individually

20

Job Control

Workflows & Suites

M R j

  • b

1 M R j

  • b

2

slide-21
SLIDE 21

21

The road… … to Big Data QA

Big Data QA is different! FitNesse Wiki test execution! Define CSV / thrift / real- world data! Use suites & fixtures for jobs/workflows! result inspection

slide-22
SLIDE 22
  • Validate RDBMS contents (via JDBC)
  • E.g. for checking the final result
  • Or use Hive + Hive-Server to query raw data

22

Results

Data Warehouse / Hive

slide-23
SLIDE 23
  • Execute arbitrary pig commands from Wiki page
  • Inspect e.g. binary intermediate results (avro, …)

23

Results

Pig

slide-24
SLIDE 24

public class PigConsole extends PigServer { public void loadAvroFileUsingAlias(String filename, String alias) { this.registerQuery( alias + "= LOAD" + filename + "USING" + AVRO_STORAGE_LOADER + ";"); } }

24

Results Pig Fixture extends PigServer

slide-25
SLIDE 25

25

Results

Server Infrastructure

Fitnesse Master T estEnvironments ProjA ProjB T estConfigurations ProjA ProjB

de v qs live de v qs live

Import / edit tests remotely QS ProjA Slave Dev ProjA Slave Live ProjA Slave ProjA QS ProjA Slave Dev ProjA Slave Live ProjA Slave Import / edit config remotely

de v qs live

slide-26
SLIDE 26

26

Thank you! dominik.benz@inovex.de

Big Data QA is different! FitNesse Wiki test execution! Define CSV / thrift / real- world data! Inspect results via Pig/Hive Use suites & fixtures for jobs/workflows!

slide-27
SLIDE 27

27

Want more? Inovex trains you!

  • Android Developer Training (3 days, Karlsruhe/München)
  • Hadoop Developer Training (3 days, Karlsruhe/Köln)
  • Certified Scrum Developer Training (5 days, Köln)
  • Pentaho Data Integration Training (4 days, München/Köln)
  • Liferay Portal-Admin Training (3 days, Karlsruhe)
  • Liferay Portal-Developer Training (4 days, Karlsruhe)

information and registration at www.inovex.de/offene-trainings

slide-28
SLIDE 28

28

Inovex @bbuzz Stefan Kathri n Bernha rd Jörg Andre w Christi an Christia n

slide-29
SLIDE 29

29

BACKUP

slide-30
SLIDE 30

30

FitNesse Server Infrastructure

Fitnesse Master T estEnvironments ProjA ProjB T estConfigurations ProjA ProjB

de v qs live de v qs live

Import / edit tests remotely QS ProjA Slave Dev ProjA Slave Live ProjA Slave ProjA QS ProjA Slave Dev ProjA Slave Live ProjA Slave Import / edit config remotely

de v qs live

slide-31
SLIDE 31
  • Download & install FitNesse server
  • Create csv log file
  • Run hadoop job which counts viewed items
  • Inspect Results with Hive

31

Results

Demo

slide-32
SLIDE 32

32

slide-33
SLIDE 33

33

FitNesse

Exemplary T est Source

!path /home/inovex/lib/*.jar | Table:Log File | | /home/inovex/viewLog.csv | | | date | user | product | browser | os | | 2013-03-12 | john | 1 | ff | win | | script | Hadoop | | upload | viewLog.csv | to hdfs | /testdata/ | | hadoop job from jar | viewLog.jar | [...] | | show | job output | | check | number of output files | 3 |

slide-34
SLIDE 34

34

FitNesse

An Exemplary T est