Diagnosing Type Errors with Class Danfeng Zhang Dimitrios - - PowerPoint PPT Presentation
Diagnosing Type Errors with Class Danfeng Zhang Dimitrios - - PowerPoint PPT Presentation
Diagnosing Type Errors with Class Danfeng Zhang Dimitrios Vytiniotis Simon Peyton-Jones Andrew C. Myers Cornell University MSR Cambridge PLDI 2015 Distinguished Paper Award Error localization is difficult for ML type systems It is a
2
“It is a truism that most bugs are detected
- nly at a great distance from their source.”
Mitchell Wand Finding the source of type errors, POPL’86
Error localization is difficult for ML type systems Even worse in sophisticated type systems The Glasgow Haskell Compiler (GHC) Type classes Type families GADTs Type signatures
3
GHC: Bool is not a numerical type Actual mistake: ‘==’ should be ‘−’
Error messages are sometimes confusing
Inference Engine
SHErrLoc: Static Holistic Error Locator
4
Most likely error cause
A general, expressive and accurate error localization method, which handles the highly expressive type system of GHC
General Error Localization [Zhang&Myers’14]
5
Programs
Jif OCaml Others
Based on Bayesian interpretation
Cannot diagnose Haskell errors
Constraints Analysis Constraints
The error cause is likely to be
- Simple
- Able to explain all errors
- Not used often on correct paths
General Diagnosis Heuristics
Cause
Haskell Program Constraints Analysis Constraints A highly expressive constraint language A decidable and efficient constraint analysis algorithm
Key Contributions
Bayesian reasoning
A Bayesian model that accounts for the richer graph representation
fact n = if n == 0 then 1 else n * fac (n == 1)
Cause Cause
Roadmap
7
Haskell Program Constraints A highly expressive constraint language
fact n = if n == 0 then 1 else n * fac (n == 1)
Type Checking as Constraint Solving
- ML type system
– Constraint elements: types – Constraints: type equalities
8
Constructors: Int, Bool, List Variables:𝛽, 𝛾, 𝛿
𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹1, … , 𝐹𝑜 𝐽 ∷= 𝐹1 = 𝐹2 𝐷 ∷= 𝑗 𝐽𝑗 Syntax of Constraints
Constraint Element
Type Classes
9
Instances of a type class, called Num Intuitively, a set of types
10
𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹1, … , 𝐹𝑜 𝐽 ∷= 𝐹1 = 𝐹2 𝑑𝑚𝑏 𝐹1, … , 𝐹𝑜 𝐷 ∷= 𝑗 𝐽𝑗 Syntax of Constraints
Modeling Type Class Constraints
𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹1, … , 𝐹𝑜 𝐽 ∷= 𝐹1 ≤ 𝐹2 𝐷 ∷= 𝑗 𝐽𝑗 𝐽 ∷= 𝐹1 ≤ 𝐹2
A type class is a set of its instances
𝐎𝐯𝐧 𝛽 ≔ 𝛽 ≤ 𝐎𝐯𝐧 𝐹1 = 𝐹2 ≔ 𝐹1 ≤ 𝐹2 ∧ 𝐹2 ≤ 𝐹1
Our constraint language
11
𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹1, … , 𝐹𝑜 𝐽 ∷= 𝐹1 = 𝐹2 𝑑𝑚𝑏 𝐹1, … , 𝐹𝑜 𝐷 ∷= 𝑗 𝐽𝑗 Syntax of Constraints
Modeling Type Class Constraints
𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹1, … , 𝐹𝑜 𝐽 ∷= 𝐹1 ≤ 𝐹2 𝐷 ∷= 𝑗 𝐽𝑗 𝐽 ∷= 𝐹1 ≤ 𝐹2 Concise; inequalities directly map to edges in a graph
Our constraint language
A type class is a set of its instances
Types are Checked Under Hypotheses
- Type signatures and GADTs introduce hypotheses
12
Constraints
a ≤ Num ⊢ a ≤ Num double :: Num a => a -> a double n = n * 2
Haskell Program
Hypothesis: a is an instance of Num Constraint hypothesis
Constraints are checked under hypotheses
Types are Checked Under Axioms
- Instance declaration may introduce (global) axioms
13
instance Eq a => Eq [a] where {...}
Haskell Program
For all a, a is an instance of Eq Eq implies list of a is an instance of Eq Eq
Int ≤ Eq ∧ ∀𝑏. 𝑏 ≤ Eq ⇒ 𝑏 ≤ Eq ⊢ Int ≤ Eq
Hypothesis Axiom
Constraint example with axioms: Int ≤ Eq ⇒ Int ≤ Eq
Modeling Hypothesis and Axioms
- Constraints (𝐷): inequalities under quantified axioms
- Quantified axioms (𝑅): implication rules
- Hypotheses (e.g., Int ≤ Num): degenerate axioms
14
𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹1, … , 𝐹𝑜 𝑅 ∷= ∀𝑏. 𝑗 𝐽𝑗 ⇒ 𝐽 𝐽 ∷= 𝐹1 ≤ 𝐹2 𝐷 ∷= 𝑘 ( 𝑗 𝑅𝑗 ⊢ 𝐽
𝑘)
Syntax of SHErrLoc Constraints
The Full Constraint Language
- Also supports
– Functions on constraint elements – Nested universally and existentially quantified variables
- Is expressive enough to model
– type classes, type families, GADTs, type signatures
15
(refer to the paper for more details)
Haskell Program Constraints Analysis Constraints A highly expressive constraint language A decidable and efficient constraint analysis algorithm
Roadmap
fact n = if n == 0 then 1 else n * fac (n == 1)
Cause
Constraint Graph in a Nutshell
- Graph construction (simple case)
– Node: constraint element – Directed edge: partial ordering
17
18
Constraint Analysis in a Nutshell
Bool is not an instance of Num fact n = if n == 0 then 1 else n * fac (n == 1) n == 1 n Num
Limitations of Previous Algorithms
[Barrett et al.’00, Melski&Reps’00,Zhang&Myers’10]
19
Int ≤ C ⊢ 𝛽 = Bool ∧ [𝛽] ≤ C Previous algorithms under-saturates the graph
Satisfiable: 𝛽 = Int Satisfiable: 𝛽 = Bool
Previous algs only add edges ([Bool] is not in the graph!)
Bool C
≰
a type class
C [𝛽] 𝛽 Bool
New Algorithm
20
Key idea: add new edges and nodes during saturation
New algorithm adds new nodes
Key challenge: naive algorithms either fail to terminate,
- r under-saturate the graph
Int ≤ C ⊢ 𝛽 = Bool ∧ [𝛽] ≤ C C [𝛽] 𝛽 Bool [Bool]
New Algorithm in a Nutshell
21
Lemma: the algorithm always terminates
Black node: node before saturation White node: added during saturation Nodes added based on patterns
- 1. one edge with two black nodes
- 2. a black/white node
Recursion check: if white node, not added based on the edge in pattern
C [𝛽] 𝛽 Bool [Bool]
Constraint Analysis
- The analysis also handles
– Functions on constraint elements – Hypotheses – Quantified axioms
- Performance
– Empirically: quadratic in graph size
22
(refer to the paper for more details)
Roadmap
23
Haskell Program Constraints Analysis Constraints A highly expressive constraint language A decidable and efficient constraint analysis algorithm
Bayesian reasoning
A Bayesian model that accounts for the richer graph representation
fact n = if n == 0 then 1 else n * fac (n == 1)
Cause Cause
Likelihood Estimation [Zhang&Myers’14]
A ranking metric based
- n Bayesian reasoning
(𝑄
1, 𝑄2 are tunable parameters)
- Simplifying assumption
– Satisfiability of paths are independent
𝑄
1 𝐹
𝑄2 1 − 𝑄2
𝑙𝐹
# sat paths using locations in E Explanation: a set
- f locations
White nodes break this assumption
- Observation: some paths using white nodes provide
neither positive nor negative evidence
25
Satisfiability depends on edges between 𝛽 and Bool
Lemma: the satisfiability of any redundant path depends on non-redundant paths
Redundant Paths (definition in paper)
C [𝛽] 𝛽 Bool [Bool]
New Ranking Metric
- Intuitively,
- Top candidates returned by an efficient A* algorithm
[Zhang&Myers’14]
26
The error cause is likely to be
- Simple
- Able to explain all errors
- Not used often on correct
non-redundant paths
General Diagnosis Heuristics
non-redundant
𝑄
1 𝐹
𝑄2 1 − 𝑄2
𝑙𝐹
# sat paths use constraints in E
Evaluation
- Implementation
– From Haskell programs to constraints – SHErrLoc
27
Modified GHC GHC Constraints
little effort
Constraint Graph Error Diagnosis Reports
SHErrLoc
Constraint Translator SHErrLoc Constraints
50 atop 20K+ LOC ~400 LOC ~7500 LOC
Evaluation Setup
- Benchmarks
– CE Benchmark: analyzed 77 Haskell programs collected from papers about type-error diagnosis, used in [Chen&Erwig’14] – Helium benchmark: analyzed 228 programs with type- checking errors, logged by the Helium tool [Hage’14]
- Ground truth
– CE Benchmark: already well-marked – Helium benchmark: user’s actual fix
- Correctness
– only when the programmer mistake is returned by tools
28
Accuracy on the CE Benchmark
29
Comparison with GHC
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Comparison with the Helium tool
[Heeren et al.’03]
Other tool finds the correct error SHErrLoc misses the error Both find the correct error Both miss the correct error
SHErrLoc finds the correct error Other tool misses the error
SHErrLoc uses no Haskell-specific heuristics!
Accuracy on the Helium Benchmark
30
Comparison with GHC
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Other tool finds the correct error SHErrLoc misses the error Both find the correct error Both miss the correct error SHErrLoc finds the correct error Other tool misses the error
Comparison with the Helium tool
Related Work
- General error localization [Zhang&Myers’14]
– Cannot handle the type system of GHC – Simpler constraints and constraint analysis algorithm
- Program analyses as constraint solving [e.g., Aiken’99, Foster et al.’06]
– No support for hypotheses and axioms
- Diagnosing Haskell error [e.g., Heeren et al’03,Hage&Heeren’07,Chen&Erwig’14]
– Haskell-specific heuristics – Unable to handle all of the sophisticated features of GHC
31
SHErrLoc
General, expressive and accurate error localization
– Applies to the highly expressive type system of GHC – A novel graph-based constraint analysis algorithm – Bayesian reasoning => more accurate error locations than with existing tools
32