diagnosing type errors with class
play

Diagnosing Type Errors with Class Danfeng Zhang Dimitrios - PowerPoint PPT Presentation

Diagnosing Type Errors with Class Danfeng Zhang Dimitrios Vytiniotis Simon Peyton-Jones Andrew C. Myers Cornell University MSR Cambridge PLDI 2015 Distinguished Paper Award Error localization is difficult for ML type systems It is a


  1. Diagnosing Type Errors with Class Danfeng Zhang Dimitrios Vytiniotis Simon Peyton-Jones Andrew C. Myers Cornell University MSR Cambridge PLDI 2015 Distinguished Paper Award

  2. Error localization is difficult for ML type systems “It is a truism that most bugs are detected only at a great distance from their source .” Mitchell Wand Finding the source of type errors , POPL’86 Even worse in sophisticated type systems The Glasgow Haskell Compiler (GHC)  Type classes  Type families  GADTs  Type signatures 2

  3. Inference Engine Actual mistake: ‘==’ should be ‘−’ GHC: Bool is not a numerical type Error messages are sometimes confusing 3

  4. SHErrLoc: Static Holistic Error Locator Most likely error cause A general , expressive and accurate error localization method, which handles the highly expressive type system of GHC 4

  5. General Error Localization [Zhang&Myers’14] Programs Cannot diagnose OCaml Jif Others Haskell errors Constraints Based on Bayesian interpretation Constraints Analysis General Diagnosis Heuristics The error cause is likely to be • Simple • Able to explain all errors • Not used often on correct paths Cause 5

  6. Key Contributions Haskell Program Cause fact n = if n == 0 then 1 else n * fac (n == 1) A Bayesian model that Constraints accounts for the richer A highly expressive graph representation constraint language Bayesian reasoning Constraints Analysis A decidable and efficient constraint analysis algorithm Cause

  7. Roadmap Haskell Program fact n = if n == 0 then 1 else n * fac (n == 1) Constraints A highly expressive constraint language 7

  8. Type Checking as Constraint Solving • ML type system Constructors: Int, Bool, List – Constraint elements: types Variables: 𝛽, 𝛾, 𝛿 – Constraints: type equalities Element Syntax of Constraints 𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹 1 , … , 𝐹 𝑜 𝐽 ∷= 𝐹 1 = 𝐹 2 𝐷 ∷= 𝑗 𝐽 𝑗 Constraint 8

  9. Type Classes Instances of a type class, called Num Intuitively, a set of types 9

  10. Modeling Type Class Constraints Syntax of Constraints 𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹 1 , … , 𝐹 𝑜 𝐽 ∷= 𝐹 1 = 𝐹 2 𝑑𝑚𝑏 𝐹 1 , … , 𝐹 𝑜 𝐷 ∷= 𝑗 𝐽 𝑗 A type class is a Our constraint language set of its instances 𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹 1 , … , 𝐹 𝑜 𝐽 ∷= 𝐹 1 ≤ 𝐹 2 𝐷 ∷= 𝑗 𝐽 𝑗 𝐽 ∷= 𝐹 1 ≤ 𝐹 2 𝐎𝐯𝐧 𝛽 ≔ 𝛽 ≤ 𝐎𝐯𝐧 𝐹 1 = 𝐹 2 ≔ 𝐹 1 ≤ 𝐹 2 ∧ 𝐹 2 ≤ 𝐹 1 10

  11. Modeling Type Class Constraints Syntax of Constraints 𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹 1 , … , 𝐹 𝑜 𝐽 ∷= 𝐹 1 = 𝐹 2 𝑑𝑚𝑏 𝐹 1 , … , 𝐹 𝑜 𝐷 ∷= 𝑗 𝐽 𝑗 A type class is a Our constraint language set of its instances 𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹 1 , … , 𝐹 𝑜 𝐽 ∷= 𝐹 1 ≤ 𝐹 2 𝐷 ∷= 𝑗 𝐽 𝑗 𝐽 ∷= 𝐹 1 ≤ 𝐹 2 Concise; inequalities directly map to edges in a graph 11

  12. Types are Checked Under Hypotheses • Type signatures and GADTs introduce hypotheses Haskell Program Constraints double :: Num a => a -> a a ≤ Num ⊢ a ≤ Num double n = n * 2 Hypothesis: a is an Constraint instance of Num hypothesis Constraints are checked under hypotheses 12

  13. Types are Checked Under Axioms • Instance declaration may introduce ( global) axioms For all a, a is an instance of Eq Eq implies list of a is an instance of Eq Eq Haskell Program instance Eq a => Eq [a] where {...} Constraint example with axioms: Int ≤ Eq ∧ ∀𝑏. 𝑏 ≤ Eq ⇒ 𝑏 ≤ Eq ⊢ Int ≤ Eq Int ≤ Eq ⇒ Int ≤ Eq Hypothesis Axiom 13

  14. Modeling Hypothesis and Axioms Syntax of SHErrLoc Constraints 𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹 1 , … , 𝐹 𝑜 𝑅 ∷= ∀𝑏. 𝑗 𝐽 𝑗 ⇒ 𝐽 𝐽 ∷= 𝐹 1 ≤ 𝐹 2 𝐷 ∷= 𝑘 ( 𝑗 𝑅 𝑗 ⊢ 𝐽 𝑘 ) • Constraints ( 𝐷 ): inequalities under quantified axioms • Quantified axioms ( 𝑅 ): implication rules • Hypotheses (e.g., Int ≤ Num ): degenerate axioms 14

  15. The Full Constraint Language • Also supports – Functions on constraint elements – Nested universally and existentially quantified variables • Is expressive enough to model – type classes, type families, GADTs, type signatures (refer to the paper for more details) 15

  16. Roadmap Haskell Program Cause fact n = if n == 0 then 1 else n * fac (n == 1) Constraints A highly expressive constraint language Constraints Analysis A decidable and efficient constraint analysis algorithm

  17. Constraint Graph in a Nutshell • Graph construction (simple case) – Node: constraint element – Directed edge: partial ordering 17

  18. Constraint Analysis in a Nutshell Bool is not an instance of Num Num n fact n = if n == 0 then 1 0 else n * fac (n == 1) n == 1 18

  19. Limitations of Previous Algorithms [Barrett et al. ’00, Melski&Reps’00,Zhang&Myers’10] Int ≤ C ⊢ 𝛽 = Bool ∧ [𝛽] ≤ C ≰ Bool C a type class Previous algorithms under-saturates the graph C Satisfiable : Previous algs only add 𝛽 = Int edges ( [Bool] is not in [𝛽] the graph!) Satisfiable : 𝛽 = Bool 𝛽 Bool 19

  20. New Algorithm Int ≤ C ⊢ 𝛽 = Bool ∧ [𝛽] ≤ C Key idea: add new edges and nodes during saturation C New algorithm adds new nodes [𝛽] [Bool] 𝛽 Bool Key challenge: naive algorithms either fail to terminate, or under-saturate the graph 20

  21. New Algorithm in a Nutshell Black node: node before saturation White node: added during saturation C Nodes added based on patterns 1. one edge with two black nodes 2. a black/white node [𝛽] [Bool] Recursion check: if white node, not 𝛽 Bool added based on the edge in pattern Lemma: the algorithm always terminates 21

  22. Constraint Analysis • The analysis also handles – Functions on constraint elements – Hypotheses – Quantified axioms (refer to the paper for more details) • Performance – Empirically: quadratic in graph size 22

  23. Roadmap Haskell Program Cause fact n = if n == 0 then 1 else n * fac (n == 1) A Bayesian model that Constraints accounts for the richer A highly expressive graph representation constraint language Bayesian reasoning Constraints Analysis A decidable and efficient constraint analysis algorithm Cause 23

  24. Likelihood Estimation [Zhang&Myers’14] Explanation: a set of locations 𝑙 𝐹 𝑄 2 # sat paths using 𝐹 A ranking metric based 𝑄 locations in E 1 1 − 𝑄 2 on Bayesian reasoning ( 𝑄 1 , 𝑄 2 are tunable parameters) White nodes • Simplifying assumption break this – Satisfiability of paths are independent assumption

  25. C Satisfiability depends on edges between 𝛽 and Bool [𝛽] [Bool] 𝛽 Bool Redundant Paths (definition in paper) • Observation: some paths using white nodes provide neither positive nor negative evidence Lemma: the satisfiability of any redundant path depends on non-redundant paths 25

  26. New Ranking Metric non-redundant 𝑙 𝐹 𝑄 2 𝐹 𝑄 # sat paths use 1 1 − 𝑄 2 constraints in E • Intuitively, General Diagnosis Heuristics The error cause is likely to be • Simple • Able to explain all errors • Not used often on correct non-redundant paths • Top candidates returned by an efficient A* algorithm [Zhang&Myers’14 ] 26

  27. little Evaluation effort Modified 50 atop GHC • Implementation 20K+ LOC – From Haskell programs to constraints – SHErrLoc GHC Constraints Constraint ~400 LOC Translator SHErrLoc ~7500 LOC SHErrLoc Constraint Error Reports Constraints Graph Diagnosis 27

  28. Evaluation Setup • Benchmarks – CE Benchmark: analyzed 77 Haskell programs collected from papers about type-error diagnosis, used in [Chen&Erwig’14] – Helium benchmark: analyzed 228 programs with type- checking errors, logged by the Helium tool [Hage’14] • Ground truth – CE Benchmark: already well-marked – Helium benchmark: user’s actual fix • Correctness – only when the programmer mistake is returned by tools 28

  29. Accuracy on the CE Benchmark 100% 90% Other tool misses the error  SHErrLoc finds the correct error 80% 70% 60% Both find the correct error  50% Both miss the correct error 40% 30% SHErrLoc misses the error  Other tool finds the correct error 20% 10% 0% Comparison with GHC Comparison with the Helium tool [Heeren et al.’03 ] SHErrLoc uses no Haskell-specific heuristics! 29

  30. Accuracy on the Helium Benchmark 100% 90% Other tool misses the error  SHErrLoc finds the correct error 80% 70% 60% Both find the correct error  50% Both miss the correct error 40% 30% Other tool finds the correct error  20% SHErrLoc misses the error 10% 0% Comparison with the Helium tool Comparison with GHC 30

  31. Related Work • General error localization [ Zhang&Myers’14 ] – Cannot handle the type system of GHC – Simpler constraints and constraint analysis algorithm • Program analyses as constraint solving [e.g., Aiken’99, Foster et al. ’06] – No support for hypotheses and axioms • Diagnosing Haskell error [e.g., Heeren et al’03,Hage&Heeren’07,Chen&Erwig’14] – Haskell-specific heuristics – Unable to handle all of the sophisticated features of GHC 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend