Diagnosing Type Errors with Class Danfeng Zhang Dimitrios - - PowerPoint PPT Presentation

diagnosing type errors with class
SMART_READER_LITE
LIVE PREVIEW

Diagnosing Type Errors with Class Danfeng Zhang Dimitrios - - PowerPoint PPT Presentation

Diagnosing Type Errors with Class Danfeng Zhang Dimitrios Vytiniotis Simon Peyton-Jones Andrew C. Myers Cornell University MSR Cambridge PLDI 2015 Distinguished Paper Award Error localization is difficult for ML type systems It is a


slide-1
SLIDE 1

Diagnosing Type Errors with Class

PLDI 2015 Distinguished Paper Award

Danfeng Zhang Andrew C. Myers

Cornell University

Dimitrios Vytiniotis Simon Peyton-Jones

MSR Cambridge

slide-2
SLIDE 2

2

“It is a truism that most bugs are detected

  • nly at a great distance from their source.”

Mitchell Wand Finding the source of type errors, POPL’86

Error localization is difficult for ML type systems Even worse in sophisticated type systems The Glasgow Haskell Compiler (GHC)  Type classes  Type families  GADTs  Type signatures

slide-3
SLIDE 3

3

GHC: Bool is not a numerical type Actual mistake: ‘==’ should be ‘−’

Error messages are sometimes confusing

Inference Engine

slide-4
SLIDE 4

SHErrLoc: Static Holistic Error Locator

4

Most likely error cause

A general, expressive and accurate error localization method, which handles the highly expressive type system of GHC

slide-5
SLIDE 5

General Error Localization [Zhang&Myers’14]

5

Programs

Jif OCaml Others

Based on Bayesian interpretation

Cannot diagnose Haskell errors

Constraints Analysis Constraints

The error cause is likely to be

  • Simple
  • Able to explain all errors
  • Not used often on correct paths

General Diagnosis Heuristics

Cause

slide-6
SLIDE 6

Haskell Program Constraints Analysis Constraints A highly expressive constraint language A decidable and efficient constraint analysis algorithm

Key Contributions

Bayesian reasoning

A Bayesian model that accounts for the richer graph representation

fact n = if n == 0 then 1 else n * fac (n == 1)

Cause Cause

slide-7
SLIDE 7

Roadmap

7

Haskell Program Constraints A highly expressive constraint language

fact n = if n == 0 then 1 else n * fac (n == 1)

slide-8
SLIDE 8

Type Checking as Constraint Solving

  • ML type system

– Constraint elements: types – Constraints: type equalities

8

Constructors: Int, Bool, List Variables:𝛽, 𝛾, 𝛿

𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹1, … , 𝐹𝑜 𝐽 ∷= 𝐹1 = 𝐹2 𝐷 ∷= 𝑗 𝐽𝑗 Syntax of Constraints

Constraint Element

slide-9
SLIDE 9

Type Classes

9

Instances of a type class, called Num Intuitively, a set of types

slide-10
SLIDE 10

10

𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹1, … , 𝐹𝑜 𝐽 ∷= 𝐹1 = 𝐹2 𝑑𝑚𝑏 𝐹1, … , 𝐹𝑜 𝐷 ∷= 𝑗 𝐽𝑗 Syntax of Constraints

Modeling Type Class Constraints

𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹1, … , 𝐹𝑜 𝐽 ∷= 𝐹1 ≤ 𝐹2 𝐷 ∷= 𝑗 𝐽𝑗 𝐽 ∷= 𝐹1 ≤ 𝐹2

A type class is a set of its instances

𝐎𝐯𝐧 𝛽 ≔ 𝛽 ≤ 𝐎𝐯𝐧 𝐹1 = 𝐹2 ≔ 𝐹1 ≤ 𝐹2 ∧ 𝐹2 ≤ 𝐹1

Our constraint language

slide-11
SLIDE 11

11

𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹1, … , 𝐹𝑜 𝐽 ∷= 𝐹1 = 𝐹2 𝑑𝑚𝑏 𝐹1, … , 𝐹𝑜 𝐷 ∷= 𝑗 𝐽𝑗 Syntax of Constraints

Modeling Type Class Constraints

𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹1, … , 𝐹𝑜 𝐽 ∷= 𝐹1 ≤ 𝐹2 𝐷 ∷= 𝑗 𝐽𝑗 𝐽 ∷= 𝐹1 ≤ 𝐹2 Concise; inequalities directly map to edges in a graph

Our constraint language

A type class is a set of its instances

slide-12
SLIDE 12

Types are Checked Under Hypotheses

  • Type signatures and GADTs introduce hypotheses

12

Constraints

a ≤ Num ⊢ a ≤ Num double :: Num a => a -> a double n = n * 2

Haskell Program

Hypothesis: a is an instance of Num Constraint hypothesis

Constraints are checked under hypotheses

slide-13
SLIDE 13

Types are Checked Under Axioms

  • Instance declaration may introduce (global) axioms

13

instance Eq a => Eq [a] where {...}

Haskell Program

For all a, a is an instance of Eq Eq implies list of a is an instance of Eq Eq

Int ≤ Eq ∧ ∀𝑏. 𝑏 ≤ Eq ⇒ 𝑏 ≤ Eq ⊢ Int ≤ Eq

Hypothesis Axiom

Constraint example with axioms: Int ≤ Eq ⇒ Int ≤ Eq

slide-14
SLIDE 14

Modeling Hypothesis and Axioms

  • Constraints (𝐷): inequalities under quantified axioms
  • Quantified axioms (𝑅): implication rules
  • Hypotheses (e.g., Int ≤ Num): degenerate axioms

14

𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹1, … , 𝐹𝑜 𝑅 ∷= ∀𝑏. 𝑗 𝐽𝑗 ⇒ 𝐽 𝐽 ∷= 𝐹1 ≤ 𝐹2 𝐷 ∷= 𝑘 ( 𝑗 𝑅𝑗 ⊢ 𝐽

𝑘)

Syntax of SHErrLoc Constraints

slide-15
SLIDE 15

The Full Constraint Language

  • Also supports

– Functions on constraint elements – Nested universally and existentially quantified variables

  • Is expressive enough to model

– type classes, type families, GADTs, type signatures

15

(refer to the paper for more details)

slide-16
SLIDE 16

Haskell Program Constraints Analysis Constraints A highly expressive constraint language A decidable and efficient constraint analysis algorithm

Roadmap

fact n = if n == 0 then 1 else n * fac (n == 1)

Cause

slide-17
SLIDE 17

Constraint Graph in a Nutshell

  • Graph construction (simple case)

– Node: constraint element – Directed edge: partial ordering

17

slide-18
SLIDE 18

18

Constraint Analysis in a Nutshell

Bool is not an instance of Num fact n = if n == 0 then 1 else n * fac (n == 1) n == 1 n Num

slide-19
SLIDE 19

Limitations of Previous Algorithms

[Barrett et al.’00, Melski&Reps’00,Zhang&Myers’10]

19

Int ≤ C ⊢ 𝛽 = Bool ∧ [𝛽] ≤ C Previous algorithms under-saturates the graph

Satisfiable: 𝛽 = Int Satisfiable: 𝛽 = Bool

Previous algs only add edges ([Bool] is not in the graph!)

Bool C

a type class

C [𝛽] 𝛽 Bool

slide-20
SLIDE 20

New Algorithm

20

Key idea: add new edges and nodes during saturation

New algorithm adds new nodes

Key challenge: naive algorithms either fail to terminate,

  • r under-saturate the graph

Int ≤ C ⊢ 𝛽 = Bool ∧ [𝛽] ≤ C C [𝛽] 𝛽 Bool [Bool]

slide-21
SLIDE 21

New Algorithm in a Nutshell

21

Lemma: the algorithm always terminates

Black node: node before saturation White node: added during saturation Nodes added based on patterns

  • 1. one edge with two black nodes
  • 2. a black/white node

Recursion check: if white node, not added based on the edge in pattern

C [𝛽] 𝛽 Bool [Bool]

slide-22
SLIDE 22

Constraint Analysis

  • The analysis also handles

– Functions on constraint elements – Hypotheses – Quantified axioms

  • Performance

– Empirically: quadratic in graph size

22

(refer to the paper for more details)

slide-23
SLIDE 23

Roadmap

23

Haskell Program Constraints Analysis Constraints A highly expressive constraint language A decidable and efficient constraint analysis algorithm

Bayesian reasoning

A Bayesian model that accounts for the richer graph representation

fact n = if n == 0 then 1 else n * fac (n == 1)

Cause Cause

slide-24
SLIDE 24

Likelihood Estimation [Zhang&Myers’14]

A ranking metric based

  • n Bayesian reasoning

(𝑄

1, 𝑄2 are tunable parameters)

  • Simplifying assumption

– Satisfiability of paths are independent

𝑄

1 𝐹

𝑄2 1 − 𝑄2

𝑙𝐹

# sat paths using locations in E Explanation: a set

  • f locations

White nodes break this assumption

slide-25
SLIDE 25
  • Observation: some paths using white nodes provide

neither positive nor negative evidence

25

Satisfiability depends on edges between 𝛽 and Bool

Lemma: the satisfiability of any redundant path depends on non-redundant paths

Redundant Paths (definition in paper)

C [𝛽] 𝛽 Bool [Bool]

slide-26
SLIDE 26

New Ranking Metric

  • Intuitively,
  • Top candidates returned by an efficient A* algorithm

[Zhang&Myers’14]

26

The error cause is likely to be

  • Simple
  • Able to explain all errors
  • Not used often on correct

non-redundant paths

General Diagnosis Heuristics

non-redundant

𝑄

1 𝐹

𝑄2 1 − 𝑄2

𝑙𝐹

# sat paths use constraints in E

slide-27
SLIDE 27

Evaluation

  • Implementation

– From Haskell programs to constraints – SHErrLoc

27

Modified GHC GHC Constraints

little effort

Constraint Graph Error Diagnosis Reports

SHErrLoc

Constraint Translator SHErrLoc Constraints

50 atop 20K+ LOC ~400 LOC ~7500 LOC

slide-28
SLIDE 28

Evaluation Setup

  • Benchmarks

– CE Benchmark: analyzed 77 Haskell programs collected from papers about type-error diagnosis, used in [Chen&Erwig’14] – Helium benchmark: analyzed 228 programs with type- checking errors, logged by the Helium tool [Hage’14]

  • Ground truth

– CE Benchmark: already well-marked – Helium benchmark: user’s actual fix

  • Correctness

– only when the programmer mistake is returned by tools

28

slide-29
SLIDE 29

Accuracy on the CE Benchmark

29

Comparison with GHC

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Comparison with the Helium tool

[Heeren et al.’03]

Other tool finds the correct error SHErrLoc misses the error  Both find the correct error Both miss the correct error

SHErrLoc finds the correct error Other tool misses the error 

SHErrLoc uses no Haskell-specific heuristics!

slide-30
SLIDE 30

Accuracy on the Helium Benchmark

30

Comparison with GHC

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Other tool finds the correct error SHErrLoc misses the error Both find the correct error Both miss the correct error SHErrLoc finds the correct error Other tool misses the error 

 

Comparison with the Helium tool

slide-31
SLIDE 31

Related Work

  • General error localization [Zhang&Myers’14]

– Cannot handle the type system of GHC – Simpler constraints and constraint analysis algorithm

  • Program analyses as constraint solving [e.g., Aiken’99, Foster et al.’06]

– No support for hypotheses and axioms

  • Diagnosing Haskell error [e.g., Heeren et al’03,Hage&Heeren’07,Chen&Erwig’14]

– Haskell-specific heuristics – Unable to handle all of the sophisticated features of GHC

31

slide-32
SLIDE 32

SHErrLoc

General, expressive and accurate error localization

– Applies to the highly expressive type system of GHC – A novel graph-based constraint analysis algorithm – Bayesian reasoning => more accurate error locations than with existing tools

32

Type classes Type families GADTs Type signatures

GHC

SHErrLoc