Teaching old type systems Teaching old type systems new tricks with - - PowerPoint PPT Presentation

teaching old type systems teaching old type systems new
SMART_READER_LITE
LIVE PREVIEW

Teaching old type systems Teaching old type systems new tricks with - - PowerPoint PPT Presentation

Teaching old type systems Teaching old type systems new tricks with type providers new tricks with type providers Tomas Petricek Tomas Petricek University of Kent and The Alan Turing Institute http://tomasp.net tomas@tomasp.net @tomaspetricek


slide-1
SLIDE 1

Teaching old type systems Teaching old type systems new tricks with type providers new tricks with type providers

Tomas Petricek Tomas Petricek

University of Kent and The Alan Turing Institute | | http://tomasp.net tomas@tomasp.net @tomaspetricek

slide-2
SLIDE 2

DATA SCIENCE DATA SCIENCE

slide-3
SLIDE 3
slide-4
SLIDE 4

DEMO DEMO Open, reproducible data visualizations Open, reproducible data visualizations

slide-5
SLIDE 5

Tooling for data science Tooling for data science

The gap between spreadsheets and programming

slide-6
SLIDE 6

Tooling for data science Tooling for data science

Making programming languages a bit easier

slide-7
SLIDE 7

Tooling for data science Tooling for data science

Learning from spreadsheet interaction model

slide-8
SLIDE 8

Reading data Reading data

Unsafe dynamic access in a typed language

var url = "http://dvd.netflix.com/Top100RSS"; var rss = XDocument.Load(topRssFeed); var channel = rss.Element("rss").Element("channel"); foreach(var item in channel.Elements("item")) { Console.WriteLine(item.Element("text").Value); }

Not found!

slide-9
SLIDE 9

Reading data Reading data

Unsafe dynamic access in a typed language

var url = "http://dvd.netflix.com/Top100RSS"; var rss = XDocument.Load(topRssFeed); var channel = rss.Element("rss").Element("channel"); foreach(var item in channel.Elements("item")) { Console.WriteLine(item.Element("title").Value); }

slide-10
SLIDE 10

Reading data Reading data

Accessing data from external data sources  Languages do not understand data  There is rarely explicit schema  Manually dene types to caputre it  Easier in dynamic languages

slide-11
SLIDE 11

Aggregating data Aggregating data

Athletes by number of gold medals from Rio 2016

  • lympics = pd.read_csv("olympics.csv")
  • lympics[olympics["Games"] == "Rio (2016)"]

.groupby("Athlete") .agg({"Gold": sum}) .sort_values(by="Gold", ascending=False) .head(8)

Unknown file Column name

slide-12
SLIDE 12

Aggregating data Aggregating data

Language and data source features you need to know  Python dictionaries {"key": value}  Generalised indexers .[ condition ]  Operation names sort_values  Data column names "Athlete"

slide-13
SLIDE 13

TYPE PROVIDERS TYPE PROVIDERS

slide-14
SLIDE 14

∅ ⊢ e : τ

slide-15
SLIDE 15

π( ) ⊢ e : τ

slide-16
SLIDE 16

DEMO DEMO Reading data from an RSS feed Reading data from an RSS feed

slide-17
SLIDE 17

F# Data library F# Data library

Type providers for structured data  Structural shape inference  Language integration via type providers  Relative type safety

slide-18
SLIDE 18

{title : string, author : {age : int}} {author : {age : float}} { title : option<string>, author : {age : float} }

slide-19
SLIDE 19

{ coordinates : {lng:num, lat:num} } string { coordinates : {lng:num, lat:num} } + string

slide-20
SLIDE 20

Shape inference Shape inference

Pragmatic design choices for usability  Prefers records for tooling  Predictable and stable  Open world assumption about sums

slide-21
SLIDE 21

DEMO DEMO Aggregating Olympic medalists Aggregating Olympic medalists

slide-22
SLIDE 22

Dot-driven development Dot-driven development

Encoding complex logic via simple member access  Type providers for member generation  Laziness for scaling to large hierarchies  Fancy types for the masses!

slide-23
SLIDE 23

Row types and phantom types Row types and phantom types

Row types to track names and types of elds

Γ ⊢ e : [ : , … , : ] f1 τ1 fn τn Γ ⊢ e. drop : [ : , … , : , : , … , : ] fi f1 τ1 fi−1 τi−1 fi+1 τi+1 fn τn

Embed row types in provided nominal types

where

Γ ⊢ e : C1 Γ ⊢ e. drop : fi C2 fields( ) = { : , … , : } C1 f1 τ1 fn τn fields( ) = { : , … , : , : , … , : } C2 f1 τ1 fi−1 τi−1 fi+1 τi+1 fn τn

slide-24
SLIDE 24

Fancy types for the masses! Fancy types for the masses!

Powerful idea that works in other contexts  Row types and phantom types  Session types for communication  Add your own fancy type here!

slide-25
SLIDE 25

BEHIND THE SCENES BEHIND THE SCENES

slide-26
SLIDE 26

Relative type safety Relative type safety

Well typed programs do not go wrong. (As long as the world is well-behaved.)

slide-27
SLIDE 27

F# Data and safety F# Data and safety

Given representative samples and an input value

S(d) ⊏ S( , … , ) d1 dn

Any program written using a type provider reduces

[x ← new C(d)] v euser ⇝∗

slide-28
SLIDE 28

DEMO DEMO Handling schema change and errors Handling schema change and errors

slide-29
SLIDE 29

F# Data and schema change F# Data and schema change

Provided type can change only in limited ways

C[e] → C[e. M] C[e] → C[match e with …] C[e] → C[int(e)]

slide-30
SLIDE 30

Structure of a type provider Structure of a type provider

Context maps names to denitions and nested contexts

L L(C) = type C(x : τ) = , m

¯ ¯ ¯ ¯ ¯ L′

Pivot provider takes schema and provides a class with context

pivot(F) = C, L

slide-31
SLIDE 31

DEMO DEMO Fancy types in action Fancy types in action

slide-32
SLIDE 32

Pivot type provider Pivot type provider

Generate classes that drop individual columns

slide-33
SLIDE 33

JSON type provider JSON type provider

Generate class corresponding to a record shape

slide-34
SLIDE 34

SUMMARY SUMMARY

slide-35
SLIDE 35

Future work Future work

Making programming with data easier  Learning from spreadsheets  Understanding programmer interactions  Handling joins and data cleaning  Read, analyse and visualize!

slide-36
SLIDE 36

DEMO DEMO Learning from spreadhseets Learning from spreadhseets

slide-37
SLIDE 37

Thank you! Thank you!

Teaching old type systems new tricks with type providers Teaching old type systems new tricks with type providers

Dot-driven Towards minimal calculus of interactions Fancy types Encoding row types via type providers Relative safety Necessity when working with data Tomas Petricek  | |  | | tomas@tomasp.net @tomaspetricek tomasp.net/academic thegamma.net fslab.org gamma.turing.ac.uk

slide-38
SLIDE 38

References References

Don Syme, Keith Battocchi, Kenji Takeda, Donna Malayeri and Tomas

  • Petricek. Themes in Information-Rich Functional Programming for

Internet-Scale Data Sources. In proceedings of DDFP 2013 Tomas Petricek, Gustavo Guerra and Don Syme. Types from data: Making structured data rst-class citizens in F#. PLDI 2016 Tomas Petricek. Data exploration through dot-driven development. In proceedings of ECOOP 2017