bug bites elephant
play

Bug bites Elephant? T est-driven Quality Assurance in Big Data - PowerPoint PPT Presentation

Bug bites Elephant? T est-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin Buzzwords Who speaks the Elephant language? Class A ? TDD! extends Mapper ? ? ROI, $$, ?


  1. Bug bites Elephant? T est-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin Buzzwords

  2. Who speaks… … the Elephant language? Class A ? TDD! extends Mapper… ? ? ROI, $$, ? … ? ? ? ? apt-get Write/execute tests, install… specify acceptance criteria, … 2

  3. The road… … to Big Data QA the FitNesse approach our Big Data QA problem test data definition / selection result inspection job & workflow control 3

  4. QA Web Intelligence @ 1&1 problem BI reporting , web analytics , … ~ 1 billion log events / day, DWH ~ 1 TB (thrift) logfiles chains of MR jobs, running on Hadoop Cluster 20 nodes / 8 cores / 96 GB RAM (CDH) 4

  5. QA An exemplary workflow problem ? inspect ? ? create (binary control (sample) ) workflow input data format s s Log Inter- Log Log Files DWH mediat Files MR MR Files (thrift) … (RDBMS e result (thrift) job 2 job 1 (thrift) ) (avro) 5

  6. QA Existing Approaches problem metho tests what? issues for our d usecase JUnit isolated functions no integration, Java syntax MRUnit 1 mapper + 1 reducer „little“ integration, Java syntax iTest hadoop Java / Groovy syntax jobs/workflows Scripts/ (manual) „script chaos“, CLI scripting/inspect. syntax  FitNesse as suitable addition / solution! 6

  7. The road… … to Big Data QA the FitNesse approach Big Data QA is different! test data definition / selection result inspection job & workflow control 7

  8. FitNesse In a nutshell „executable“ Wiki - Pages (returning test results) (almost) natural language test specification „fully integrated connection to SUT via standalone wiki and (Java-)“ Fixtures “ acceptance testing framework” 8

  9. FitNesse Architecture Overview Fixtur Brows es er public int script | FitNesse check | numResults Server num results | { ... } 3 |  „calling java methods System under Test from wiki“, compare return values  Integrates with REST, 9 Jenkins…

  10. FitNesse An Exemplary T est 10

  11. FitNesse Exemplary T est Source !path /home/inovex/lib/*.jar | script | Hadoop | | upload | viewLog.csv | to hdfs | /testdata/ | | hadoop job from jar | viewLog.jar | [...] | | show | job output | | check | number of output files | 3 | 11

  12. FitNesse Hadoop Fixture Java Code public class Hadoop { public boolean uploadToHdfs(String localFile, String remoteFile) {...} public boolean hadoopJobFromJar(String jar, String input, String output) {...} public String jobOutput() {...} public String numberOfOutputFiles() {...} } 12

  13. The road… … to Big Data QA Fitnesse Wiki test execution! Big Data QA is different! test data definition / selection result inspection job & workflow control 13

  14. T est CSV Data 14

  15. T est Thrift Data ‣ Big Data: Efficient data transfer among heterogeneous sources ‣ Define Interface via IDL , Compiler for many languages 15

  16. T est Real World Data Data ‣ Dev/T est Hadoop Cluster: Identical Hardware like Prod, but fewer nodes ‣ (random/biased) sampling e.g. on daily basis ‣ Feedback loop: ‣ identify „ special cases “ from real data ‣ include them in (manual) data definition ‣ Gradually increase test coverage / artefact quality 16

  17. The road… … to Big Data QA FitNesse Wiki test execution! Big Data QA is different! Define CSV / thrift / real- world test data! result inspection job & workflow control 17

  18. Job Swiss Army Knife: Shell Control ‣ Execute arbitrary (shell) commands ‣ Mainly a wrapper around apache.commons.exec.CommandLine 18

  19. Job Hadoop Fixture Control ‣ Hide complexity from test authors ‣ „define“ appropriate test language via (Java) method names ‣ re-use other fixtures (Shell, …) internally 19

  20. Job Workflows & Suites Control ‣ FitNesse allows to group tests into suites 1 M R b o j ‣ Can be used to simulate MR processing chains 2 ‣ SetupSuite / T M R o b j earDownSuite for creating / destroying test conditions ‣ T ests can still be executed individually 20

  21. The road… … to Big Data QA FitNesse Wiki test execution! Big Data QA is different! Define CSV / thrift / real- world data! result inspection Use suites & fixtures for jobs/workflows! 21

  22. Results Data Warehouse / Hive ‣ Validate RDBMS contents (via JDBC) ‣ E.g. for checking the final result ‣ Or use Hive + Hive-Server to query raw data 22

  23. Results Pig ‣ Execute arbitrary pig commands from Wiki page ‣ Inspect e.g. binary intermediate results (avro, …) 23

  24. Results Pig Fixture extends PigServer public class PigConsole extends PigServer { public void loadAvroFileUsingAlias(String filename, String alias) { this.registerQuery( alias + "= LOAD" + filename + "USING" + AVRO_STORAGE_LOADER + ";"); } } 24

  25. Results Server Infrastructure Fitnesse Master T estEnvironments T estConfigurations ProjA ProjB ProjA ProjB de qs live de qs live v v Import / Import / edit config edit tests remotely remotely de qs live Dev Dev QS QS Live Live v ProjA ProjA ProjA ProjA ProjA ProjA ProjA Slave Slave Slave Slave Slave Slave 25

  26. Thank you! dominik.benz@inovex.de FitNesse Wiki test execution! Big Data QA is different! Define CSV / thrift / real- world data! Inspect results Use suites & fixtures via Pig/Hive for jobs/workflows! 26

  27. Want more? Inovex trains you!  Android Developer Training (3 days, Karlsruhe/München)  Certified Scrum Developer Training (5 days, Köln)  Hadoop Developer Training (3 days, Karlsruhe/Köln)  Liferay Portal-Developer Training (4 days, Karlsruhe)  Liferay Portal-Admin Training (3 days, Karlsruhe)  Pentaho Data Integration Training (4 days, München/Köln) information and registration at www.inovex.de/offene-trainings 27

  28. Inovex @bbuzz Stefan Bernha Kathri rd n Jörg Andre Christia Christi w n an 28

  29. BACKUP 29

  30. FitNesse Server Infrastructure Fitnesse Master T estEnvironments T estConfigurations ProjA ProjB ProjA ProjB de qs live de qs live v v Import / Import / edit config edit tests remotely remotely de qs live Dev Dev QS QS Live Live v ProjA ProjA ProjA ProjA ProjA ProjA ProjA Slave Slave Slave Slave Slave Slave 30

  31. Results Demo ‣ Download & install FitNesse server ‣ Create csv log file ‣ Run hadoop job which counts viewed items ‣ Inspect Results with Hive 31

  32. 32

  33. FitNesse Exemplary T est Source !path /home/inovex/lib/*.jar | Table:Log File | | /home/inovex/viewLog.csv | | | date | user | product | browser | os | | 2013-03-12 | john | 1 | ff | win | | script | Hadoop | | upload | viewLog.csv | to hdfs | /testdata/ | | hadoop job from jar | viewLog.jar | [...] | | show | job output | | check | number of output files | 3 | 33

  34. FitNesse An Exemplary T est 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend