Why unit test? UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON - - PowerPoint PPT Presentation

why unit test
SMART_READER_LITE
LIVE PREVIEW

Why unit test? UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON - - PowerPoint PPT Presentation

Why unit test? UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON Dibya Chakravorty Test Automation Engineer How can we test an implementation? def my_function(argument): my_function(argument_1) ... return_value_1 my_function(argument_2)


slide-1
SLIDE 1

Why unit test?

UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON

Dibya Chakravorty

Test Automation Engineer

slide-2
SLIDE 2

UNIT TESTING FOR DATA SCIENCE IN PYTHON

How can we test an implementation?

def my_function(argument): ... my_function(argument_1) return_value_1 my_function(argument_2) return_value_2 my_function(argument_3) return_value_3

slide-3
SLIDE 3

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Life cycle of a function

Implementation

slide-4
SLIDE 4

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Life cycle of a function

Implementation Test

slide-5
SLIDE 5

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Life cycle of a function

Implementation Test Accepted implementation PASS

slide-6
SLIDE 6

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Life cycle of a function

Implementation Test Accepted implementation Bugfix FAIL PASS

slide-7
SLIDE 7

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Life cycle of a function

Implementation Test Accepted implementation Feature request or Refactoring Bugfix FAIL PASS

slide-8
SLIDE 8

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Life cycle of a function

Implementation Test Accepted implementation Feature request or Refactoring Bugfix FAIL PASS

slide-9
SLIDE 9

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Life cycle of a function

Implementation Test Accepted implementation Feature request or Refactoring Bug found Bugfix FAIL PASS

slide-10
SLIDE 10

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Life cycle of a function

Implementation Test Accepted implementation Feature request or Refactoring Bug found Bugfix FAIL PASS

slide-11
SLIDE 11

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Life cycle of a function

Implementation Test Accepted implementation Feature request or Refactoring Bug found Bugfix FAIL PASS

100 times

slide-12
SLIDE 12

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Example

def row_to_list(row): ... area (sq. ft.) price (dollars) 2,081 314,942 1,059 186,606 293,410 1,148 206,186 1,506 248,419 1,210 214,114 1,697 277,794 1,268 194,345 2,318 372,162 1,463238,765 1,468 239,007

File: housing_data.txt

slide-13
SLIDE 13

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Data format

def row_to_list(row): ...

Argument Type Return value

"2,081\t314,942\n"

Valid

["2,081", "314,942"]

area (sq. ft.) price (dollars) 2,081 314,942 1,059 186,606 293,410 1,148 206,186 1,506 248,419 1,210 214,114 1,697 277,794 1,268 194,345 2,318 372,162 1,463238,765 1,468 239,007

File: housing_data.txt

slide-14
SLIDE 14

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Data isn't clean

def row_to_list(row): ...

Argument Type Return value

"2,081\t314,942\n"

Valid

["2,081", "314,942"] "\t293,410\n"

Invalid

None

area (sq. ft.) price (dollars) 2,081 314,942 1,059 186,606 293,410 <-- row with missing area 1,148 206,186 1,506 248,419 1,210 214,114 1,697 277,794 1,268 194,345 2,318 372,162 1,463238,765 1,468 239,007

File: housing_data.txt

slide-15
SLIDE 15

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Data isn't clean

def row_to_list(row): ...

Argument Type Return value

"2,081\t314,942\n"

Valid

["2,081", "314,942"] "\t293,410\n"

Invalid

None "1,463238,765\n"

Invalid

None

area (sq. ft.) price (dollars) 2,081 314,942 1,059 186,606 293,410 <-- row with missing area 1,148 206,186 1,506 248,419 1,210 214,114 1,697 277,794 1,268 194,345 2,318 372,162 1,463238,765 <-- row with missing tab 1,468 239,007

File: housing_data.txt

slide-16
SLIDE 16

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Time spent in testing this function

def row_to_list(row): ...

Argument Type Return value

"2,081\t314,942\n"

Valid

["2,081", "314,942"] "\t293,410\n"

Invalid

None "1,463238,765\n"

Invalid

None

row_to_list("2,081\t314,942\n") ["2,081", "314,942"] row_to_list("\t293,410\n") None row_to_list("1,463238,765\n") None

slide-17
SLIDE 17

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Time spent in testing this function

Implementation Test Accepted implementation Feature request or Refactoring Bug found Bugfix FAIL PASS

100 times

slide-18
SLIDE 18

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Time spent in testing this function

slide-19
SLIDE 19

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Manual testing vs. unit tests

Unit tests automate the repetitive testing process and saves time.

slide-20
SLIDE 20

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Learn unit testing - with a data science spin

area (sq. ft.) price (dollars) 2,081 314,942 1,059 186,606 293,410 1,148 206,186 1,506 248,419 1,210 214,114 1,697 277,794 1,268 194,345 2,318 372,162 1,463238,765 1,468 239,007

Linear regression of housing price against area

slide-21
SLIDE 21

UNIT TESTING FOR DATA SCIENCE IN PYTHON

GitHub repository of the course

Implementation of functions like row_to_list() .

slide-22
SLIDE 22

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Develop a complete unit test suite

data/ src/ |-- data/ |-- features/ |-- models/ |-- visualization/

slide-23
SLIDE 23

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Develop a complete unit test suite

data/ src/ |-- data/ |-- features/ |-- models/ |-- visualization/ tests/ # Test suite |-- data/ |-- features/ |-- models/ |-- visualization/

Write unit tests for your own projects.

slide-24
SLIDE 24

Let's practice these concepts!

UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON

slide-25
SLIDE 25

Write a simple unit test using pytest

UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON

Dibya Chakravorty

Test Automation Engineer

slide-26
SLIDE 26

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Testing on the console

row_to_list("2,081\t314,942\n") ["2,081", "314,942"] row_to_list("\t293,410\n") None row_to_list("1,463238,765\n") None

Unit tests improve this process.

slide-27
SLIDE 27

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Python unit testing libraries

pytest unittest nosetests doctest We will use pytest! Has all essential features. Easiest to use. Most popular.

slide-28
SLIDE 28

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Step 1: Create a le

Create test_row_to_list.py .

test_ indicate unit tests inside (naming convention).

Also called test modules.

slide-29
SLIDE 29

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Step 2: Imports

T est module: test_row_to_list.py

import pytest import row_to_list

slide-30
SLIDE 30

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Step 3: Unit tests are Python functions

T est module: test_row_to_list.py

import pytest import row_to_list def test_for_clean_row():

slide-31
SLIDE 31

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Step 3: Unit tests are Python functions

T est module: test_row_to_list.py

import pytest import row_to_list def test_for_clean_row():

Argument Type Return value

"2,081\t314,942\n"

Valid

["2,081", "314,942"]

slide-32
SLIDE 32

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Step 4: Assertion

T est module: test_row_to_list.py

import pytest import row_to_list def test_for_clean_row(): assert ...

Argument Type Return value

"2,081\t314,942\n"

Valid

["2,081", "314,942"]

slide-33
SLIDE 33

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Theoretical structure of an assertion

assert boolean_expression assert True assert False Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError

slide-34
SLIDE 34

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Step 4: Assertion

T est module: test_row_to_list.py

import pytest import row_to_list def test_for_clean_row(): assert row_to_list("2,081\t314,942\n") == \ ["2,081", "314,942"]

Argument Type Return value

"2,081\t314,942\n"

Valid

["2,081", "314,942"]

slide-35
SLIDE 35

UNIT TESTING FOR DATA SCIENCE IN PYTHON

A second unit test

T est module: test_row_to_list.py

import pytest import row_to_list def test_for_clean_row(): assert row_to_list("2,081\t314,942\n") == \ ["2,081", "314,942"] def test_for_missing_area(): assert row_to_list("\t293,410\n") is None

Argument Type Return value

"2,081\t314,942\n"

Valid

["2,081", "314,942"] "\t293,410\n"

Invalid

None

slide-36
SLIDE 36

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Checking for None values

Do this for checking if var is None .

assert var is None

Do not do this.

assert var == None

slide-37
SLIDE 37

UNIT TESTING FOR DATA SCIENCE IN PYTHON

A third unit test

T est module: test_row_to_list.py

import pytest import row_to_list def test_for_clean_row(): assert row_to_list("2,081\t314,942\n") == \ ["2,081", "314,942"] def test_for_missing_area(): assert row_to_list("\t293,410\n") is None def test_for_missing_tab(): assert row_to_list("1,463238,765\n") is None

Argument Type Return value

"2,081\t314,942\n"

Valid

["2,081", "314,942"] "\t293,410\n"

Invalid

None "1,463238,765\n"

Invalid

None

slide-38
SLIDE 38

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Step 5: Running unit tests

Do this in the command line.

pytest test_row_to_list.py

slide-39
SLIDE 39

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Running unit tests in DataCamp exercises

slide-40
SLIDE 40

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Running unit tests in DataCamp exercises

slide-41
SLIDE 41

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Running unit tests in DataCamp exercises

slide-42
SLIDE 42

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Running unit tests in DataCamp exercises

slide-43
SLIDE 43

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Next lesson: test result report

slide-44
SLIDE 44

Let's write some unit tests!

UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON

slide-45
SLIDE 45

Understanding test result report

UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON

Dibya Chakravorty

Test Automation Engineer

slide-46
SLIDE 46

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Unit tests for row_to_list()

T est module: test_row_to_list.py

import pytest import row_to_list def test_for_clean_row(): assert row_to_list("2,081\t314,942\n") == \ ["2,081", "314,942"] def test_for_missing_area(): assert row_to_list("\t293,410\n") is None def test_for_missing_tab(): assert row_to_list("1,463238,765\n") is None

Argument Type Return value

"2,081\t314,942\n"

Valid

["2,081", "314,942"] "\t293,410\n"

Invalid

None "1,463238,765\n"

Invalid

None

slide-47
SLIDE 47

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Test result report

!pytest test_row_to_list.py ============================= test session starts ============================== platform linux -- Python 3.6.7, pytest-4.0.1, py-1.8.0, pluggy-0.9.0 rootdir: /tmp/tmpvdblq9g7, inifile: plugins: mock-1.10.0 collecting ... collected 3 items test_row_to_list.py .F. [100%] =================================== FAILURES =================================== ____________________________ test_for_missing_area _____________________________ def test_for_missing_area():

slide-48
SLIDE 48

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Section 1: general information

============================= test session starts ============================== platform linux -- Python 3.6.7, pytest-4.0.1, py-1.8.0, pluggy-0.9.0 rootdir: /tmp/tmpvdblq9g7, inifile: plugins: mock-1.10.0

slide-49
SLIDE 49

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Section 2: Test result

collecting ... collected 3 items test_row_to_list.py .F. [100%]

slide-50
SLIDE 50

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Section 2: Test result

collecting ... collected 3 items test_row_to_list.py .F. [100%]

Character Meaning When Action F Failure An exception is raised when running unit test. Fix the function or unit test.

slide-51
SLIDE 51

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Section 2: Test result

collecting ... collected 3 items test_row_to_list.py .F. [100%]

Character Meaning When Action F Failure An exception is raised when running unit test. Fix the function or unit test. assertion raises AssertionError

def test_for_missing_area(): assert row_to_list("\t293,410") is None # AssertionError from this line

slide-52
SLIDE 52

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Section 2: Test result

collecting ... collected 3 items test_row_to_list.py .F. [100%]

Character Meaning When Action F Failure An exception is raised when running unit test. Fix the function or unit test. another exception

def test_for_missing_area(): assert row_to_list("\t293,410") is none # NameError from this line

slide-53
SLIDE 53

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Section 2: Test result

collecting ... collected 3 items test_row_to_list.py .F. [100%]

Character Meaning When Action F Failure An exception is raised when running unit test. Fix the function or unit test. . Passed No exception raised when running unit test Everything is ne. Be happy!

slide-54
SLIDE 54

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Section 3: Information on failed tests

=================================== FAILURES =================================== ____________________________ test_for_missing_area _____________________________ def test_for_missing_area(): > assert row_to_list("\t293,410\n") is None E AssertionError: assert ['', '293,410'] is None E + where ['', '293,410'] = row_to_list('\t293,410\n') test_row_to_list.py:7: AssertionError

The line raising the exception is marked by > .

> assert row_to_list("\t293,410\n") is None

slide-55
SLIDE 55

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Section 3: Information on failed tests

=================================== FAILURES =================================== ____________________________ test_for_missing_area _____________________________ def test_for_missing_area(): > assert row_to_list("\t293,410\n") is None E AssertionError: assert ['', '293,410'] is None E + where ['', '293,410'] = row_to_list('\t293,410\n') test_row_to_list.py:7: AssertionError

the exception is an AssertionError .

E AssertionError: assert ['', '293,410'] is None

slide-56
SLIDE 56

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Section 3: Information about failed tests

=================================== FAILURES =================================== ____________________________ test_for_missing_area _____________________________ def test_for_missing_area(): > assert row_to_list("\t293,410\n") is None E AssertionError: assert ['', '293,410'] is None E + where ['', '293,410'] = row_to_list('\t293,410\n') test_row_to_list.py:7: AssertionError

the line containing where displays return values.

E + where ['', '293,410'] = row_to_list('\t293,410\n')

slide-57
SLIDE 57

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Section 4: Test result summary

====================== 1 failed, 2 passed in 0.03 seconds ======================

Result summary from all unit tests that ran: 1 failed, 2 passed tests. T

  • tal time for running tests: 0.03 seconds.

Much faster than testing on the interpreter!

slide-58
SLIDE 58

Let's practice reading test result reports

UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON

slide-59
SLIDE 59

More benets and test types

UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON

Dibya Chakravorty

Test Automation Engineer

slide-60
SLIDE 60

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Unit tests serve as documentation

T est module: test_row_to_list.py

import pytest import row_to_list def test_for_clean_row(): assert row_to_list("2,081\t314,942\n") == \ ["2,081", "314,942"] def test_for_missing_area(): assert row_to_list("\t293,410\n") is None def test_for_missing_tab(): assert row_to_list("1,463238,765\n") is None

slide-61
SLIDE 61

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Unit tests serve as documentation

T est module: test_row_to_list.py

import pytest import row_to_list def test_for_clean_row(): assert row_to_list("2,081\t314,942\n") == \ ["2,081", "314,942"] def test_for_missing_area(): assert row_to_list("\t293,410\n") is None def test_for_missing_tab(): assert row_to_list("1,463238,765\n") is None

Created from the test module Argument Return value

"2,081\t314,942\n" ["2,081", "314,942"] "\t293,410\n" None "1,463238,765\n" None

slide-62
SLIDE 62

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Guess function's purpose by reading unit tests

!cat test_row_to_list.py

slide-63
SLIDE 63

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Guess function's purpose by reading unit tests

!cat test_row_to_list.py

slide-64
SLIDE 64

UNIT TESTING FOR DATA SCIENCE IN PYTHON

More trust

Users can run tests and verify that the package works.

slide-65
SLIDE 65

UNIT TESTING FOR DATA SCIENCE IN PYTHON

More trust

Users can run tests and verify that the package works.

slide-66
SLIDE 66

UNIT TESTING FOR DATA SCIENCE IN PYTHON

More trust

Users can run tests and verify that the package works.

slide-67
SLIDE 67

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Reduced downtime

slide-68
SLIDE 68

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Reduced downtime

slide-69
SLIDE 69

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Reduced downtime

slide-70
SLIDE 70

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Reduced downtime

slide-71
SLIDE 71

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Reduced downtime

slide-72
SLIDE 72

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Reduced downtime

slide-73
SLIDE 73

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Reduced downtime

slide-74
SLIDE 74

UNIT TESTING FOR DATA SCIENCE IN PYTHON

All benets

Time savings. Improved documentation. More trust. Reduced downtime.

slide-75
SLIDE 75

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Tests we already wrote

row_to_list()

slide-76
SLIDE 76

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Tests we already wrote

row_to_list() convert_to_int()

slide-77
SLIDE 77

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Data module

row_to_list() convert_to_int() Data Raw data Clean data

slide-78
SLIDE 78

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Feature module

row_to_list() convert_to_int() Data Raw data Clean data Feature Features

slide-79
SLIDE 79

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Models module

row_to_list() convert_to_int() Data Raw data Clean data Feature Features Models Predictive model Housing area Housing price

slide-80
SLIDE 80

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Unit test

row_to_list() convert_to_int() Data Raw data Clean data Feature Features Models Predictive model Housing area Housing price

slide-81
SLIDE 81

UNIT TESTING FOR DATA SCIENCE IN PYTHON

What is a unit?

Small, independent piece of code. Python function or class.

slide-82
SLIDE 82

UNIT TESTING FOR DATA SCIENCE IN PYTHON

Integration test

row_to_list() convert_to_int() Data Raw data Clean data Feature Features Models Predictive model Housing area Housing price

slide-83
SLIDE 83

UNIT TESTING FOR DATA SCIENCE IN PYTHON

End to end test

row_to_list() convert_to_int() Data Raw data Clean data Feature Features Models Predictive model Housing area Housing price

slide-84
SLIDE 84

UNIT TESTING FOR DATA SCIENCE IN PYTHON

This course focuses on unit tests

Writing unit tests is the best way to learn pytest.

slide-85
SLIDE 85

UNIT TESTING FOR DATA SCIENCE IN PYTHON

In Chapter 2...

Learn more pytest. Write more advanced unit tests. Work with functions in the features and models modules.

slide-86
SLIDE 86

Let's practice these concepts!

UN IT TES TIN G F OR DATA S CIEN CE IN P YTH ON