testing and documenting your data doesn t have to suck
play

Testing and documenting your data doesnt have to suck Data Council - PowerPoint PPT Presentation

Testing and documenting your data doesnt have to suck Data Council NYC - Nov 2019 @abeGong About me (Abe) Data scientist/engineer Tech-first and enterprise Human-scale, ethical data First time in NYC as an adult (?!)


  1. Testing and documenting your data doesn’t have to suck Data Council NYC - Nov 2019 @abeGong

  2. About me (Abe) Data scientist/engineer ● ● Tech-first and “enterprise” ● Human-scale, ethical data First time in NYC as an adult (?!) ● @abeGong

  3. Outline 1. A thing we do that is ABSOLUTELY CRAZY 2. How to defeat pipeline debt 3. Volunteers wanted! @abeGong

  4. a thing we do that is ABSOLUTELY CRAZY @abeGong

  5. a thing we do that is ABSOLUTELY CRAZY @abeGong

  6. a thing we do that is ABSOLUTELY CRAZY Undocumented @abeGong

  7. a thing we do that is ABSOLUTELY CRAZY Undocumented Untested @abeGong

  8. a thing we do that is ABSOLUTELY CRAZY Undocumented Untested Unstable @abeGong

  9. a thing we do that is ABSOLUTELY CRAZY Undocumented Untested Unstable @abeGong

  10. a thing we do that is ABSOLUTELY CRAZY Undocumented Untested Unstable @abeGong

  11. a thing we do that is ABSOLUTELY CRAZY Undocumented Untested Unstable @abeGong

  12. Trying to maintain a data system that is untested, undocumented and unstable is ABSOLUTELY CRAZY @abeGong

  13. ? @abeGong

  14. a thing we do that is ABSOLUTELY CRAZY Give the monster a name -> Pipeline debtc @abeGong

  15. a thing we do that is ABSOLUTELY CRAZY Give the monster a name The monster’s name is pipeline debt . -> Pipeline debtc @abeGong

  16. Always know what to expect from your data @abeGong

  17. Expectations are assertions about data expect_column_to_exist expect_table_row_count_to_be_between expect_column_values_to_be_unique expect_column_values_to_not_be_null expect_column_values_to_be_between expect_column_values_to_match_regex expect_column_values_to_match_strftime_format expect_column_mean_to_be_between expect_column_kl_divergence_to_be_less_than great_expectations etc. etc. etc. @abeGong

  18. Expectations are assertions about data expect_column_to_exist expect_table_row_count_to_be_between expect_column_values_to_be_unique expect_column_values_to_not_be_null expect_column_values_to_be_between expect_column_values_to_match_regex expect_column_values_to_match_strftime_format expect_column_mean_to_be_between expect_column_kl_divergence_to_be_less_than great_expectations etc. etc. etc. @abeGong

  19. Expectations are assertions about data expect_column_to_exist expect_table_row_count_to_be_between expect_column_values_to_be_unique expect_column_values_to_not_be_null expect_column_values_to_be_between expect_column_values_to_match_regex expect_column_values_to_match_strftime_format expect_column_mean_to_be_between expect_column_kl_divergence_to_be_less_than great_expectations etc. etc. etc. @abeGong

  20. Expectations are assertions about data expect_column_to_exist expect_table_row_count_to_be_between expect_column_values_to_be_unique expect_column_values_to_not_be_null expect_column_values_to_be_between expect_column_values_to_match_regex expect_column_values_to_match_strftime_format expect_column_mean_to_be_between expect_column_kl_divergence_to_be_less_than great_expectations etc. etc. etc. @abeGong

  21. Expectations are assertions about data Expectation Types @abeGong

  22. Expectations are assertions about data Expectation Types Data Sources @abeGong

  23. How to draw an owl 1. Draw some circles 2. Draw the rest of the stupid owl @abeGong

  24. Great Expectations has a bunch of shiny new features @abeGong

  25. Great Expectations has a bunch of shiny new features Validation Renderers Stores Profilers Operators and Views Data Context and Data Asset namespace Expectation Types Data Sources @abeGong

  26. Great Expectations has a bunch of shiny new features @abeGong

  27. Great Expectations has a bunch of shiny new features @abeGong

  28. Great Expectations has a bunch of shiny new features @abeGong

  29. Set up data testing in a day, not a month. @abeGong

  30. Your docs are your tests, and your tests are your docs. @abeGong Icons created by SBTS from Noun Project

  31. Your docs are your tests, and your tests are your docs. @abeGong https://www.locallyoptimistic.com/post/data_dictionaries/

  32. Your docs are your tests, and your tests are your docs. expect_column_values_to_be_between( “Values in this column should be between column=”room_temp”, 60 and 75, at least 95% of the time.” min_value=60, max_value=75, mostly=.95 “Warning: more than 5% of values fell ) outside the specified range of 60 to 75.” @abeGong

  33. Your docs are your tests, and your tests are your docs. @abeGong

  34. Warning: Great Expectations still has rough edges @abeGong

  35. Warning: Great Expectations still has rough edges Validation Renderers Stores Profilers Operators and Views Data Context and Data Asset namespace Expectation Types Data Sources @abeGong

  36. Volunteers wanted! 1. Pick a day 2. Work with us 3. Get set up 4. Improve the project How to get in touch: 👌 https://greatexpectations.io/slack @abeGong

  37. Recap @abeGong

  38. Trying to maintain a data system that is untested, undocumented and unstable is ABSOLUTELY CRAZY @abeGong

  39. a thing we do that is ABSOLUTELY CRAZY Give the monster a name The monster’s name is pipeline debt . -> Pipeline debtc @abeGong

  40. To defeat pipeline debt, always know what to expect of your data. expect_column_to_exist expect_table_row_count_to_be_between expect_column_values_to_be_unique expect_column_values_to_not_be_null expect_column_values_to_be_between expect_column_values_to_match_regex expect_column_values_to_match_strftime_format expect_column_mean_to_be_between expect_column_kl_divergence_to_be_less_than etc. etc. etc. @abeGong

  41. Set up data testing in a day, not a month. @abeGong

  42. Your docs are your tests, and your tests are your docs. @abeGong Icons created by SBTS from Noun Project

  43. Warning: Great Expectations still has rough edges @abeGong

  44. Volunteers wanted! 1. Pick a day 2. Work with us 3. Get set up 4. Improve the project How to get in touch: 👌 https://greatexpectations.io/slack @abeGong

  45. Thank you, New York! https://greatexpectations.io/slack @abeGong

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend