Records with Rank Polymorphism Justin Slepak Olin Shivers - - PowerPoint PPT Presentation

records with rank polymorphism
SMART_READER_LITE
LIVE PREVIEW

Records with Rank Polymorphism Justin Slepak Olin Shivers - - PowerPoint PPT Presentation

Records with Rank Polymorphism Justin Slepak Olin Shivers Panagiotis Manolios jrslepak@ccs.neu.edu shivers@ccs.neu.edu pete@ccs.neu.edu Northeastern University Boston, MA, USA 1 The Remora Project Remora: Higher-order rank-polymorphic


slide-1
SLIDE 1

Records with Rank Polymorphism

Justin Slepak Olin Shivers Panagiotis Manolios

jrslepak@ccs.neu.edu shivers@ccs.neu.edu pete@ccs.neu.edu

Northeastern University Boston, MA, USA

1

slide-2
SLIDE 2

The Remora Project

Remora: Higher-order rank-polymorphic language

2

slide-3
SLIDE 3

The Remora Project

Remora: Higher-order rank-polymorphic language Functions work on arbitrarily high-dimensional data

3

slide-4
SLIDE 4

The Remora Project

Remora: Higher-order rank-polymorphic language Functions work on arbitrarily high-dimensional data (f )

4

slide-5
SLIDE 5

The Remora Project

Remora: Higher-order rank-polymorphic language Functions work on arbitrarily high-dimensional data (f ) (f )

5

slide-6
SLIDE 6

The Remora Project

Remora: Higher-order rank-polymorphic language Functions work on arbitrarily high-dimensional data (f ) (f )

6

slide-7
SLIDE 7

The Remora Project

Remora: Higher-order rank-polymorphic language Functions work on arbitrarily high-dimensional data (f ) (f )

for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { for (k = 0; k < 4; k++) { ... } } }

7

slide-8
SLIDE 8

The Remora Project

Remora: Higher-order rank-polymorphic language Functions work on arbitrarily high-dimensional data (f ) (f )

for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { for (k = 0; k < 4; k++) { ... } } }

Can we statically determine implicit control structure?

8

slide-9
SLIDE 9

How This Happened

Visited TensorFlow team at Google

9

slide-10
SLIDE 10

How This Happened

Visited TensorFlow team at Google "Can you make something like Pandas data frames?"

1

slide-11
SLIDE 11

How This Happened

Visited TensorFlow team at Google "Can you make something like Pandas data frames?" Remora: homogeneous data

11

slide-12
SLIDE 12

How This Happened

Visited TensorFlow team at Google "Can you make something like Pandas data frames?" Remora: homogeneous data Data frame: columnar table

12

slide-13
SLIDE 13

Two Kinds of Aggregate Data

13

slide-14
SLIDE 14

Two Kinds of Aggregate Data

Arrays

14

slide-15
SLIDE 15

Two Kinds of Aggregate Data

Arrays

Homogeneous data

15

slide-16
SLIDE 16

Two Kinds of Aggregate Data

Arrays

Homogeneous data Consume uniformly

16

slide-17
SLIDE 17

Two Kinds of Aggregate Data

Arrays

Homogeneous data Consume uniformly Rank polymorphism

17

slide-18
SLIDE 18

Two Kinds of Aggregate Data

Arrays

Homogeneous data Consume uniformly Rank polymorphism

Records

18

slide-19
SLIDE 19

Two Kinds of Aggregate Data

Arrays

Homogeneous data Consume uniformly Rank polymorphism

Records

Heterogeneous data

19

slide-20
SLIDE 20

Two Kinds of Aggregate Data

Arrays

Homogeneous data Consume uniformly Rank polymorphism

Records

Heterogeneous data Consume individually

2

slide-21
SLIDE 21

Two Kinds of Aggregate Data

Arrays

Homogeneous data Consume uniformly Rank polymorphism

Records

Heterogeneous data Consume individually Field projection

21

slide-22
SLIDE 22

Two Kinds of Aggregate Data

Arrays

Homogeneous data Consume uniformly Rank polymorphism

Records

Heterogeneous data Consume individually Field projection Data Frames ≌ Arrays + Records

22

slide-23
SLIDE 23

Two Kinds of Aggregate Data

Arrays

Homogeneous data Consume uniformly Rank polymorphism

Records

Heterogeneous data Consume individually Field projection Data Frames ≌ Arrays + Records

How to design records?

23

slide-24
SLIDE 24

Two Kinds of Aggregate Data

Arrays

Homogeneous data Consume uniformly Rank polymorphism

Records

Heterogeneous data Consume individually Field projection Data Frames ≌ Arrays + Records

How to design records?

(to work with rank polymorphism)

24

slide-25
SLIDE 25

Records with Rank Polymorphism

Small collection of mutually composable constructs

25

slide-26
SLIDE 26

Records with Rank Polymorphism

Small collection of mutually composable constructs

  • 1. Rank polymorphism in Remora
  • 2. Data frames in Pandas
  • 3. Synthesis design
  • 4. Conclusion

26

slide-27
SLIDE 27

Rank Polymorphism

27

slide-28
SLIDE 28

Data Model

28

slide-29
SLIDE 29

Data Model

Atoms

29

slide-30
SLIDE 30

Data Model

Atoms Arrays

3

slide-31
SLIDE 31

Data Model

Atoms 0 1 9.25 -4+i 's' #t #f Arrays

31

slide-32
SLIDE 32

Data Model

Atoms 0 1 9.25 -4+i 's' #t #f Arrays 1 2 3 4 5 6

3,2

32

slide-33
SLIDE 33

Data Model

Atoms 0 1 9.25 -4+i 's' #t #f Arrays 1 2 3 4 5 6

3,2

#t #f #f #t

4

33

slide-34
SLIDE 34

Data Model

Atoms 0 1 9.25 -4+i 's' #t #f Arrays 1 2 3 4 5 6

3,2

#t #f #f #t

4

100

  • 34
slide-35
SLIDE 35

Data Model

Atoms 0 1 9.25 -4+i 's' #t #f Arrays 1 2 3 4 5 6

3,2

#t #f #f #t

4

100

  • Shape

Sequence of sizes in each dimension

35

slide-36
SLIDE 36

Data Model

Atoms 0 1 9.25 -4+i 's' #t #f Arrays 1 2 3 4 5 6

3,2

#t #f #f #t

4

100

  • Shape

Sequence of sizes in each dimension Rank Number of dimensions an array has

36

slide-37
SLIDE 37

Data Model

Atoms 0 1 9.25 -4+i 's' #t #f Arrays 1 2 3 4 5 6

3,2

#t #f #f #t

4

100

  • Shape

Sequence of sizes in each dimension Rank Number of dimensions an array has Expressions only stand for arrays, not atoms

37

slide-38
SLIDE 38

Decomposing an Array

38

slide-39
SLIDE 39

Decomposing an Array

Cells Individual sub-arrays a function will consume

39

slide-40
SLIDE 40

Decomposing an Array

Cells Individual sub-arrays a function will consume Frame Aggregate structure around cells

4

slide-41
SLIDE 41

Decomposing an Array

Cells Individual sub-arrays a function will consume Frame Aggregate structure around cells Rank n array can split n+1 ways

41

slide-42
SLIDE 42

Decomposing an Array

Cells Individual sub-arrays a function will consume Frame Aggregate structure around cells Rank n array can split n+1 ways 0 1 2 3 1 2 3 4 2 3 4 5

3,4

3×4 matrix frame, twelve scalar cells

42

slide-43
SLIDE 43

Decomposing an Array

Cells Individual sub-arrays a function will consume Frame Aggregate structure around cells Rank n array can split n+1 ways 0 1 2 3 1 2 3 4 2 3 4 5

3,4

3×4 matrix frame, twelve scalar cells 0 1 2 3 1 2 3 4 2 3 4 5

3,4

3-vector frame, three 4-vector cells

43

slide-44
SLIDE 44

Decomposing an Array

Cells Individual sub-arrays a function will consume Frame Aggregate structure around cells Rank n array can split n+1 ways 0 1 2 3 1 2 3 4 2 3 4 5

3,4

3×4 matrix frame, twelve scalar cells 0 1 2 3 1 2 3 4 2 3 4 5

3,4

3-vector frame, three 4-vector cells 0 1 2 3 1 2 3 4 2 3 4 5

3,4

scalar frame,

  • ne 3×4 matrix cell

44

slide-45
SLIDE 45

Arrays

45

slide-46
SLIDE 46

Arrays

Abstract value 1 0 0 1 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1

3,7

46

slide-47
SLIDE 47

Arrays

Abstract value 1 0 0 1 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1

3,7

0 1 2 3 4 5 6 7

2,2,2

47

slide-48
SLIDE 48

Arrays

Abstract value 1 0 0 1 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1

3,7

0 1 2 3 4 5 6 7

2,2,2

  • 48
slide-49
SLIDE 49

Arrays

Abstract value Remora syntax 1 0 0 1 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1

3,7

[[1 0 0 1 1 0 1] [0 1 0 1 0 1 1] [0 0 1 0 1 1 1]] 0 1 2 3 4 5 6 7

2,2,2

  • 49
slide-50
SLIDE 50

Arrays

Abstract value Remora syntax 1 0 0 1 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1

3,7

[[1 0 0 1 1 0 1] [0 1 0 1 0 1 1] [0 0 1 0 1 1 1]] 0 1 2 3 4 5 6 7

2,2,2

[[[0 1] [2 3]] [[4 5] [6 7]]]

  • 5
slide-51
SLIDE 51

Arrays

Abstract value Remora syntax 1 0 0 1 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1

3,7

[[1 0 0 1 1 0 1] [0 1 0 1 0 1 1] [0 0 1 0 1 1 1]] 0 1 2 3 4 5 6 7

2,2,2

[[[0 1] [2 3]] [[4 5] [6 7]]]

  • 51
slide-52
SLIDE 52

Function Application

(+ [10 20] [[1 2] [3 4]])

52

slide-53
SLIDE 53

Function Application

(+ [10 20] [[1 2] [3 4]]) Cells: 10, 20 1, 2, 3, 4

53

slide-54
SLIDE 54

Function Application

(+ [10 20] [[1 2] [3 4]]) Cells: 10, 20 1, 2, 3, 4 Frame: [2] [2 2]

54

slide-55
SLIDE 55

Function Application

(+ [10 20] [[1 2] [3 4]]) Cells: 10, 20 1, 2, 3, 4 Frame: [2] [2 2] Lifted: [[10 10] [20 20]] [[1 2] [3 4]]

55

slide-56
SLIDE 56

Function Application

(+ [10 20] [[1 2] [3 4]]) Cells: 10, 20 1, 2, 3, 4 Frame: [2] [2 2] Lifted: [[10 10] [20 20]] [[1 2] [3 4]] (v+ [10 20] [[1 2] [3 4]])

56

slide-57
SLIDE 57

Function Application

(+ [10 20] [[1 2] [3 4]]) Cells: 10, 20 1, 2, 3, 4 Frame: [2] [2 2] Lifted: [[10 10] [20 20]] [[1 2] [3 4]] (v+ [10 20] [[1 2] [3 4]]) Cells: [10 20] [1 2], [3 4]

57

slide-58
SLIDE 58

Function Application

(+ [10 20] [[1 2] [3 4]]) Cells: 10, 20 1, 2, 3, 4 Frame: [2] [2 2] Lifted: [[10 10] [20 20]] [[1 2] [3 4]] (v+ [10 20] [[1 2] [3 4]]) Cells: [10 20] [1 2], [3 4] Frame: [] [2]

58

slide-59
SLIDE 59

Function Application

(+ [10 20] [[1 2] [3 4]]) Cells: 10, 20 1, 2, 3, 4 Frame: [2] [2 2] Lifted: [[10 10] [20 20]] [[1 2] [3 4]] (v+ [10 20] [[1 2] [3 4]]) Cells: [10 20] [1 2], [3 4] Frame: [] [2] Lifted: [[10 20] [10 20]] [[1 2] [3 4]]

59

slide-60
SLIDE 60

Function Application

(+ [10 20] [[1 2] [3 4]]) Cells: + 10, 20 1, 2, 3, 4 Frame: [] [2] [2 2] Lifted: [[+ +] [+ +]] [[10 10] [20 20]] [[1 2] [3 4]] (v+ [10 20] [[1 2] [3 4]]) Cells: [10 20] [1 2], [3 4] Frame: [] [2] Lifted: [[10 20] [10 20]] [[1 2] [3 4]]

6

slide-61
SLIDE 61

Function Application

(+ [10 20] [[1 2] [3 4]]) Cells: + 10, 20 1, 2, 3, 4 Frame: [] [2] [2 2] Lifted: [[+ +] [+ +]] [[10 10] [20 20]] [[1 2] [3 4]] (v+ [10 20] [[1 2] [3 4]]) Cells: v+ [10 20] [1 2], [3 4] Frame: [] [] [2] Lifted: [v+ v+] [[10 20] [10 20]] [[1 2] [3 4]]

61

slide-62
SLIDE 62

Function Application

(+ [10 20] [[1 2] [3 4]]) ↦ [[11 12] [23 24]] Cells: + 10, 20 1, 2, 3, 4 Frame: [] [2] [2 2] Lifted: [[+ +] [+ +]] [[10 10] [20 20]] [[1 2] [3 4]] (v+ [10 20] [[1 2] [3 4]]) Cells: v+ [10 20] [1 2], [3 4] Frame: [] [] [2] Lifted: [v+ v+] [[10 20] [10 20]] [[1 2] [3 4]]

62

slide-63
SLIDE 63

Function Application

(+ [10 20] [[1 2] [3 4]]) ↦ [[11 12] [23 24]] Cells: + 10, 20 1, 2, 3, 4 Frame: [] [2] [2 2] Lifted: [[+ +] [+ +]] [[10 10] [20 20]] [[1 2] [3 4]] (v+ [10 20] [[1 2] [3 4]]) ↦ [[11 22] [13 24]] Cells: v+ [10 20] [1 2], [3 4] Frame: [] [] [2] Lifted: [v+ v+] [[10 20] [10 20]] [[1 2] [3 4]]

63

slide-64
SLIDE 64

Expected Argument Rank

Frame/cell split determined by function

64

slide-65
SLIDE 65

Expected Argument Rank

Frame/cell split determined by function + 0, 0 dot-prod 1, 1 minv 2 lerp 0, 0, 0 poly-eval 1, 0

65

slide-66
SLIDE 66

Expected Argument Rank

Frame/cell split determined by function + 0, 0 dot-prod 1, 1 minv 2 lerp 0, 0, 0 poly-eval 1, 0 (define (lerp (lo 0) (hi 0) (α 0)) (+ (* α hi) (* (- 1 α) lo)))

66

slide-67
SLIDE 67

Expected Argument Rank

Frame/cell split determined by function + 0, 0 dot-prod 1, 1 minv 2 lerp 0, 0, 0 poly-eval 1, 0 (define (lerp (lo 0) (hi 0) (α 0)) (+ (* α hi) (* (- 1 α) lo))) Change argument rank by η-expansion

67

slide-68
SLIDE 68

Expected Argument Rank

Frame/cell split determined by function + 0, 0 dot-prod 1, 1 minv 2 lerp 0, 0, 0 poly-eval 1, 0 (define (lerp (lo 0) (hi 0) (α 0)) (+ (* α hi) (* (- 1 α) lo))) Change argument rank by η-expansion v+

68

slide-69
SLIDE 69

Expected Argument Rank

Frame/cell split determined by function + 0, 0 dot-prod 1, 1 minv 2 lerp 0, 0, 0 poly-eval 1, 0 (define (lerp (lo 0) (hi 0) (α 0)) (+ (* α hi) (* (- 1 α) lo))) Change argument rank by η-expansion v+ = (λ ((a 1) (b 1)) (+ a b))

69

slide-70
SLIDE 70

Expected Argument Rank

Frame/cell split determined by function + 0, 0 dot-prod 1, 1 minv 2 lerp 0, 0, 0 poly-eval 1, 0 (define (lerp (lo 0) (hi 0) (α 0)) (+ (* α hi) (* (- 1 α) lo))) Change argument rank by η-expansion v+ = (λ ((a 1) (b 1)) (+ a b)) = ~(1 1)+

7

slide-71
SLIDE 71

Ragged Data

iota input filter reshape

71

slide-72
SLIDE 72

Ragged Data

iota input filter reshape Output shape depends on input atoms

72

slide-73
SLIDE 73

Ragged Data

iota input filter reshape Output shape depends on input atoms > (iota [3])

73

slide-74
SLIDE 74

Ragged Data

iota input filter reshape Output shape depends on input atoms > (iota [3]) [0 1 2]

74

slide-75
SLIDE 75

Ragged Data

iota input filter reshape Output shape depends on input atoms > (iota [3]) > (iota [[3] [2] [4]]) [0 1 2]

75

slide-76
SLIDE 76

Ragged Data

iota input filter reshape Output shape depends on input atoms > (iota [3]) > (iota [[3] [2] [4]]) [0 1 2] [[0 1 2] [0 1] [0 1 2 3]]

76

slide-77
SLIDE 77

Ragged Data

iota input filter reshape Output shape depends on input atoms > (iota [3]) > (iota [[3] [2] [4]]) [0 1 2] [[0 1 2] [0 1] [0 1 2 3]] 0 1 2 0 1 0 1 2 3

3,?

77

slide-78
SLIDE 78

Ragged Data

iota input filter reshape Output shape depends on input atoms > (iota [3]) > (iota [[3] [2] [4]]) [0 1 2] [[0 1 2] [0 1] [0 1 2 3]] 0 1 2 0 1 0 1 2 3

3,?

78

slide-79
SLIDE 79

Ragged Data

New type of atom: boxed array 0 1 2 3

2,2

79

slide-80
SLIDE 80

Ragged Data

New type of atom: boxed array 0 1 2 3

2,2

0 1 2

3

0 1

2

0 1 2 3

4 3

8

slide-81
SLIDE 81

Ragged Data

New type of atom: boxed array 0 1 2 3

2,2

0 1 2

3

0 1

2

0 1 2 3

4 3

Lift-safe variant iota*

81

slide-82
SLIDE 82

Ragged Data

New type of atom: boxed array 0 1 2 3

2,2

0 1 2

3

0 1

2

0 1 2 3

4 3

Lift-safe variant iota* > (iota* [[3] [2] [4]])

82

slide-83
SLIDE 83

Ragged Data

New type of atom: boxed array 0 1 2 3

2,2

0 1 2

3

0 1

2

0 1 2 3

4 3

Lift-safe variant iota* > (iota* [[3] [2] [4]]) [(box [0 1 2]) (box [0 1]) (box [0 1 2 3])]

83

slide-84
SLIDE 84

Pandas DataFrame

84

slide-85
SLIDE 85

Python Dictionary

>>> dallas_temp = {'loc': 'Dallas', 'day': 28, ... 'month': 3, 'year': 2019, ... 'hi': 74, 'lo': 57}

85

slide-86
SLIDE 86

Python Dictionary

>>> dallas_temp = {'loc': 'Dallas', 'day': 28, ... 'month': 3, 'year': 2019, ... 'hi': 74, 'lo': 57} >>> temp_list = [dallas_temp, ... {'loc': 'Dublin', 'day': 1, ... 'month': 4, 'year': 2019, ... 'hi': 74, 'lo': 57} ... {'loc': 'Nome', 'day': 31, ... 'month': 3, 'year': 2019, ... 'hi': 31, 'lo': 26} ... {'loc': 'Tunis', 'day': 31, ... 'month': 3, 'year': 2019, ... 'hi': 21, 'lo': 12}]

86

slide-87
SLIDE 87

Pandas DataFrame

More sophisticated tool

87

slide-88
SLIDE 88

Pandas DataFrame

More sophisticated tool >>> by_rows = pd.DataFrame(temp_list)

88

slide-89
SLIDE 89

Pandas DataFrame

More sophisticated tool >>> by_rows = pd.DataFrame(temp_list) >>> by_cols = ... pd.DataFrame({'loc': ['Dallas', 'Dublin', ... 'Nome', 'Tunis'], ... 'day': [28, 1, 31, 31], ... 'month': [3, 4, 3, 3], ... 'year': 2019, ... 'hi': [74, 11, 31, 21], ... 'lo': [57, 5, 26, 12]})

89

slide-90
SLIDE 90

Column Subset

Dictionary of Series objects

9

slide-91
SLIDE 91

Column Subset

Dictionary of Series objects >>> by_cols['loc']

91

slide-92
SLIDE 92

Column Subset

Dictionary of Series objects >>> by_cols['loc'] 0 Dallas 1 Dublin 2 Nome 3 Tunis

92

slide-93
SLIDE 93

Column Subset

Dictionary of Series objects >>> by_cols['loc'] 0 Dallas 1 Dublin 2 Nome 3 Tunis Ask for list of columns, get new DataFrame

93

slide-94
SLIDE 94

Column Subset

Dictionary of Series objects >>> by_cols['loc'] 0 Dallas 1 Dublin 2 Nome 3 Tunis Ask for list of columns, get new DataFrame >>> by_cols[['loc', 'hi']]

94

slide-95
SLIDE 95

Column Subset

Dictionary of Series objects >>> by_cols['loc'] 0 Dallas 1 Dublin 2 Nome 3 Tunis Ask for list of columns, get new DataFrame >>> by_cols[['loc', 'hi']] loc hi 0 Dallas 74 1 Dublin 11 2 Nome 31 3 Tunis 21

95

slide-96
SLIDE 96

Column Update

>>> def normalize_temp(r): ... if in_usa(r['loc']): ... r['lo'] = f2c(r['lo']) ... r['hi'] = f2c(r['hi']) ... return r

96

slide-97
SLIDE 97

Column Update

>>> def normalize_temp(r): ... if in_usa(r['loc']): ... r['lo'] = f2c(r['lo']) ... r['hi'] = f2c(r['hi']) ... return r >>> by_cols.apply(normalize_temp, axis=1)

97

slide-98
SLIDE 98

Column Update

>>> def normalize_temp(r): ... if in_usa(r['loc']): ... r['lo'] = f2c(r['lo']) ... r['hi'] = f2c(r['hi']) ... return r >>> by_cols.apply(normalize_temp, axis=1) loc day month year hi lo 0 Dallas 28 3 2019 23.33 13.89 1 Dublin 1 4 2019 11.00 5.00 2 Nome 31 3 2019 -0.56 -3.33 3 Tunis 31 3 2019 21.00 12.00

98

slide-99
SLIDE 99

Row Filter

>>> by_cols['loc'] == 'Dublin'

99

slide-100
SLIDE 100

Row Filter

>>> by_cols['loc'] == 'Dublin' 0 False 1 True 2 False 3 False

1

slide-101
SLIDE 101

Row Filter

>>> by_cols['loc'] == 'Dublin' 0 False 1 True 2 False 3 False >>> by_cols[by_cols['loc'] == 'Dublin']

11

slide-102
SLIDE 102

Row Filter

>>> by_cols['loc'] == 'Dublin' 0 False 1 True 2 False 3 False >>> by_cols[by_cols['loc'] == 'Dublin'] loc day month year hi lo 1 Dublin 1 4 2019 11 5

12

slide-103
SLIDE 103

Row Filter

Lifting user-defined functions?

13

slide-104
SLIDE 104

Row Filter

Lifting user-defined functions? >>> in_usa(by_cols['loc'])

14

slide-105
SLIDE 105

Row Filter

Lifting user-defined functions? >>> in_usa(by_cols['loc']) ValueError: The truth value

  • f a Series is ambiguous.

15

slide-106
SLIDE 106

Row Filter

Lifting user-defined functions? >>> in_usa(by_cols['loc']) ValueError: The truth value

  • f a Series is ambiguous.

>>> [in_usa(l) for l in by_cols['loc']]

16

slide-107
SLIDE 107

Row Filter

Lifting user-defined functions? >>> in_usa(by_cols['loc']) ValueError: The truth value

  • f a Series is ambiguous.

>>> [in_usa(l) for l in by_cols['loc']] [True, False, True, False]

17

slide-108
SLIDE 108

Row Filter

Lifting user-defined functions? >>> in_usa(by_cols['loc']) ValueError: The truth value

  • f a Series is ambiguous.

>>> [in_usa(l) for l in by_cols['loc']] [True, False, True, False] >>> by_cols[[in_usa(l) for l in by_cols['loc']]]

18

slide-109
SLIDE 109

Row Filter

Lifting user-defined functions? >>> in_usa(by_cols['loc']) ValueError: The truth value

  • f a Series is ambiguous.

>>> [in_usa(l) for l in by_cols['loc']] [True, False, True, False] >>> by_cols[[in_usa(l) for l in by_cols['loc']]] loc day month year hi lo 0 Dallas 28 3 2019 74 57 2 Nome 31 3 2019 31 26

19

slide-110
SLIDE 110

Row Partition

No lifting over masks

11

slide-111
SLIDE 111

Row Partition

No lifting over masks >>> us__non_us = ... by_cols.groupby([in_usa(l) for l in ... by_cols['loc']])

111

slide-112
SLIDE 112

Row Partition

No lifting over masks >>> us__non_us = ... by_cols.groupby([in_usa(l) for l in ... by_cols['loc']]) >>> us__non_us.get_group(True) loc day month year hi lo 0 Dallas 28 3 2019 74 57 2 Nome 31 3 2019 31 26

112

slide-113
SLIDE 113

Row Partition

No lifting over masks >>> us__non_us = ... by_cols.groupby([in_usa(l) for l in ... by_cols['loc']]) >>> us__non_us.get_group(True) loc day month year hi lo 0 Dallas 28 3 2019 74 57 2 Nome 31 3 2019 31 26 >>> us__non_us.get_group(False) loc day month year hi lo 1 Dublin 1 4 2019 11 5 3 Tunis 31 3 2019 21 12

113

slide-114
SLIDE 114

Ergonomics

Wide range of functionality

114

slide-115
SLIDE 115

Ergonomics

Wide range of functionality Support for row-wise and column-wise operations

115

slide-116
SLIDE 116

Ergonomics

Wide range of functionality Support for row-wise and column-wise operations Many ad hoc structures and methods

116

slide-117
SLIDE 117

Ergonomics

Wide range of functionality Support for row-wise and column-wise operations Many ad hoc structures and methods Lifting rules do not generalize

117

slide-118
SLIDE 118

Ergonomics

Wide range of functionality Support for row-wise and column-wise operations Many ad hoc structures and methods Lifting rules do not generalize "It is better to have 100 functions

  • perate on one data structure than

10 functions on 10 data structures." — Alan Perlis

118

slide-119
SLIDE 119

Synthesis Design

119

slide-120
SLIDE 120

Synthesis Design Goals

Create/project/update with anonymous functions

12

slide-121
SLIDE 121

Synthesis Design Goals

Create/project/update with anonymous functions Eventual possibility of static typing

121

slide-122
SLIDE 122

Records

Constructor function (record fname1 ... fnamen)

122

slide-123
SLIDE 123

Records

Constructor function (record fname1 ... fnamen) Can still support "literal" syntax {(fname1 arg1) ... (fnamen argn)}

123

slide-124
SLIDE 124

Records

Constructor function (record fname1 ... fnamen) Can still support "literal" syntax {(fname1 arg1) ... (fnamen argn)} ((record fname1 ... fnamen) arg1 ... argn)

124

slide-125
SLIDE 125

Lenses

Function for focusing on piece of data structure (lens fname)

125

slide-126
SLIDE 126

Lenses

Function for focusing on piece of data structure (lens fname) Composition → focus on nested pieces (compose (lens fname1) ... (lens fnamen))

126

slide-127
SLIDE 127

Lenses

Function for focusing on piece of data structure (lens fname) Composition → focus on nested pieces (compose (lens fname1) ... (lens fnamen)) Three operations on lenses view set

  • ver

127

slide-128
SLIDE 128

Lens Operations

((view L) R) ↦ Field L from within R

128

slide-129
SLIDE 129

Lens Operations

((view L) R) ↦ Field L from within R (((set L) V) R) ↦ R with field L changed to V

129

slide-130
SLIDE 130

Lens Operations

((view L) R) ↦ Field L from within R (((set L) V) R) ↦ R with field L changed to V (((over L) F) R) ↦ R with F applied to field L

13

slide-131
SLIDE 131

Lens Operations

((view L) R) ↦ Field L from within R (((set L) V) R) ↦ R with field L changed to V (((over L) F) R) ↦ R with F applied to field L Syntactic sugar

131

slide-132
SLIDE 132

Lens Operations

((view L) R) ↦ Field L from within R (((set L) V) R) ↦ R with field L changed to V (((over L) F) R) ↦ R with F applied to field L Syntactic sugar #_(fname ...) (view (compose (lens fname) ...))

132

slide-133
SLIDE 133

Lens Operations

((view L) R) ↦ Field L from within R (((set L) V) R) ↦ R with field L changed to V (((over L) F) R) ↦ R with F applied to field L Syntactic sugar #_(fname ...) (view (compose (lens fname) ...)) #=(fname ...) (set (compose (lens fname) ...))

133

slide-134
SLIDE 134

Lens Operations

((view L) R) ↦ Field L from within R (((set L) V) R) ↦ R with field L changed to V (((over L) F) R) ↦ R with F applied to field L Syntactic sugar #_(fname ...) (view (compose (lens fname) ...)) #=(fname ...) (set (compose (lens fname) ...)) #^(fname ...) (over (compose (lens fname) ...))

134

slide-135
SLIDE 135

Record Creation

> (define dallas-temp {(loc "Dallas") (day 28) (month 3) (year 2019) (hi 74) (lo 57)})

135

slide-136
SLIDE 136

Record Creation

> (define dallas-temp {(loc "Dallas") (day 28) (month 3) (year 2019) (hi 74) (lo 57)}) > (#_(year) dallas-temp)

136

slide-137
SLIDE 137

Record Creation

> (define dallas-temp {(loc "Dallas") (day 28) (month 3) (year 2019) (hi 74) (lo 57)}) > (#_(year) dallas-temp) 2019

137

slide-138
SLIDE 138

Record Creation

> (define dallas-temp {(loc "Dallas") (day 28) (month 3) (year 2019) (hi 74) (lo 57)}) > (#_(year) dallas-temp) 2019 > ([#_(hi) #_(lo)] dallas-temp)

138

slide-139
SLIDE 139

Record Creation

> (define dallas-temp {(loc "Dallas") (day 28) (month 3) (year 2019) (hi 74) (lo 57)}) > (#_(year) dallas-temp) 2019 > ([#_(hi) #_(lo)] dallas-temp) [74 57]

139

slide-140
SLIDE 140

Table Creation—Two Different Ways

> (define temp-readings ; by rows [dallas-temp {(loc "Dublin") (day 1) (month 4) (year 2019) (hi 11) (lo 5)} {(loc "Nome") (day 31) (month 3) (year 2019) (hi 31) (lo 26)} {(loc "Tunis") (day 31) (month 3) (year 2019) (hi 21) (lo 12)}])

14

slide-141
SLIDE 141

Table Creation—Two Different Ways

> (define temp-readings ; by rows [dallas-temp {(loc "Dublin") (day 1) (month 4) (year 2019) (hi 11) (lo 5)} {(loc "Nome") (day 31) (month 3) (year 2019) (hi 31) (lo 26)} {(loc "Tunis") (day 31) (month 3) (year 2019) (hi 21) (lo 12)}]) > (define temp-readings ; by columns {(loc ["Dallas" "Dublin" "Nome" "Tunis"]) (day [28 1 31 31]) (month [3 4 3 3]) (year 2019) (hi [74 11 31 21]) (lo [57 5 26 12])})

141

slide-142
SLIDE 142

Table Creaction

{(loc ["Dallas" "Dublin" "Nome" "Tunis"]) (day [28 1 31 31]) (month [3 4 3 3]) (year 2019) (hi [74 11 31 21]) (lo [57 5 26 12])}

142

slide-143
SLIDE 143

Table Creaction

{(loc ["Dallas" "Dublin" "Nome" "Tunis"]) (day [28 1 31 31]) (month [3 4 3 3]) (year 2019) (hi [74 11 31 21]) (lo [57 5 26 12])} ((record loc day month year hi lo) ["Dallas" "Dublin" "Nome" "Tunis"] [28 1 31 31] [3 4 3 3] 2019 [74 11 31 21] [57 5 26 12])

143

slide-144
SLIDE 144

Column Extraction

> (#_(loc) temp-readings)

144

slide-145
SLIDE 145

Column Extraction

> (#_(loc) temp-readings) ["Dallas" "Dublin" "Nome" "Tunis"]

145

slide-146
SLIDE 146

Column Extraction

> (#_(loc) temp-readings) ["Dallas" "Dublin" "Nome" "Tunis"] > (define hi-only {(loc (#_(loc) temp-readings)) (day (#_(day) temp-readings)) (month (#_(month) temp-readings)) (hi (#_(hi) temp-readings))})

146

slide-147
SLIDE 147

Column Extraction

> (#_(loc) temp-readings) ["Dallas" "Dublin" "Nome" "Tunis"] > (define hi-only {(loc (#_(loc) temp-readings)) (day (#_(day) temp-readings)) (month (#_(month) temp-readings)) (hi (#_(hi) temp-readings))}) > hi-only [{(loc "Dallas") (day 28) (month 3) (hi 74)} {(loc "Dublin") (day 1) (month 4) (hi 11)} {(loc "Nome") (day 31) (month 3) (hi 31)} {(loc "Tunis") (day 31) (month 3) (hi 21)}]

147

slide-148
SLIDE 148

Column Update

> (define (normalize-temps (w 0)) (define fix-temp (select (in-usa? (#_(loc) w)) f->c id)) ((#^(lo) fix-temp) ((#^(hi) fix-temp) w)))

148

slide-149
SLIDE 149

Column Update

> (define (normalize-temps (w 0)) (define fix-temp (select (in-usa? (#_(loc) w)) f->c id)) ((#^(lo) fix-temp) ((#^(hi) fix-temp) w))) > (normalize-temps temp-readings)

149

slide-150
SLIDE 150

Column Update

> (define (normalize-temps (w 0)) (define fix-temp (select (in-usa? (#_(loc) w)) f->c id)) ((#^(lo) fix-temp) ((#^(hi) fix-temp) w))) > (normalize-temps temp-readings) [{(loc "Dallas") (day 28) (month 3) (year 2019) (hi 23.33) (lo 13.89)} {(loc "Dublin") (day 1) (month 4) (year 2019) (hi 11) (lo 5)} {(loc "Nome") (day 31) (month 3) (year 2019) (hi -0.56) (lo -3.33)} {(loc "Tunis") (day 31) (month 3) (year 2019) (hi 21) (lo 12)}]

15

slide-151
SLIDE 151

Row Filter

> (define usa-mask ((compose in-usa? #_(loc)) hi-only))

151

slide-152
SLIDE 152

Row Filter

> (define usa-mask ((compose in-usa? #_(loc)) hi-only)) > usa-mask [#t #f #t #f]

152

slide-153
SLIDE 153

Row Filter

> (define usa-mask ((compose in-usa? #_(loc)) hi-only)) > usa-mask [#t #f #t #f] > (filter usa-mask hi-only)

153

slide-154
SLIDE 154

Row Filter

> (define usa-mask ((compose in-usa? #_(loc)) hi-only)) > usa-mask [#t #f #t #f] > (filter usa-mask hi-only) [{(loc "Dallas") (day 28) (month 3) (hi 74)} {(loc "Nome") (day 31) (month 3) (hi 31)}]

154

slide-155
SLIDE 155

Filter → Partition

[ ]

155

slide-156
SLIDE 156

Filter → Partition

[ ] Row Filter

156

slide-157
SLIDE 157

Filter → Partition

[ ] Row Filter filter with one mask

157

slide-158
SLIDE 158

Filter → Partition

[ ] Row Filter filter with one mask [ ]

158

slide-159
SLIDE 159

Filter → Partition

[ ] Row Filter filter with one mask [ ] Row Partition

159

slide-160
SLIDE 160

Filter → Partition

[ ] Row Filter filter with one mask [ ] Row Partition filter with multiple masks

16

slide-161
SLIDE 161

Filter → Partition

[ ] Row Filter filter with one mask [ ] Row Partition filter with multiple masks [[ ] [ ] [ ]]

161

slide-162
SLIDE 162

Filter → Partition

[ ] Row Filter filter with one mask [ ] Row Partition filter with multiple masks [[ ] [ ] [ ]] Ragged data!

162

slide-163
SLIDE 163

Filter → Partition

[ ] Row Filter filter* with one mask [ ] Row Partition filter with multiple masks [[ ] [ ] [ ]] Ragged data!

163

slide-164
SLIDE 164

Filter → Partition

[ ] Row Filter filter* with one mask (box [ ]) Row Partition filter with multiple masks [[ ] [ ] [ ]] Ragged data!

164

slide-165
SLIDE 165

Filter → Partition

[ ] Row Filter filter* with one mask (box [ ]) Row Partition filter* with multiple masks [[ ] [ ] [ ]] Ragged data!

165

slide-166
SLIDE 166

Filter → Partition

[ ] Row Filter filter* with one mask (box [ ]) Row Partition filter* with multiple masks [(box [ ]) (box [ ]) (box [ ])] Ragged data!

166

slide-167
SLIDE 167

Row Partition

> (filter* [usa-mask (not usa-mask)] hi-only)

167

slide-168
SLIDE 168

Row Partition

> (filter* [usa-mask (not usa-mask)] hi-only) [(box [{(loc "Dallas") (day 28) (month 3) (hi 74)} {(loc "Nome") (day 31) (month 3) (hi 31)}]) (box [{(loc "Dublin") (day 1) (month 4) (hi 11)} {(loc "Tunis") (day 31) (month 3) (hi 21)}])]

168

slide-169
SLIDE 169

How to design records?

169

slide-170
SLIDE 170

How to design records?

(to work with rank polymorphism)

17

slide-171
SLIDE 171

How to design records?

(to work with rank polymorphism)

Small collection of mutually composable constructs

171