Deep Generation of Coq Lemma Names Using Elaborated Terms Pengyu Nie - PowerPoint PPT Presentation

Deep Generation of Coq Lemma Names Using Elaborated Terms Pengyu Nie 1 , Karl Palmskog 2 , Junyi Jessy Li 1 , and Milos Gligoric 1 IJCAR 2020 1 The University of Texas at Austin 2 KTH Royal Institute of Technology

Motivation: Verification Projects Growing in Size Proof assistants are increasingly used to formalize results in advanced mathematics and develop large trustworthy software systems Project Domain Assistant LOC CompCert compiler Coq 120k+ MathComp math Coq 85k+ Verdi Raft k/v store Coq 50k+ seL4 kernel Isabelle/HOL 200k+ BilbyFS file system Isabelle/HOL 14k+ Verification projects face challenges similar to those in large software projects: maintenance and enforcement of coding conventions 1 / 28

Motivation: Verification Projects Growing in Size Proof assistants are increasingly used to formalize results in advanced mathematics and develop large trustworthy software systems Project Domain Assistant LOC CompCert compiler Coq 120k+ MathComp math Coq 85k+ Verdi Raft k/v store Coq 50k+ seL4 kernel Isabelle/HOL 200k+ BilbyFS file system Isabelle/HOL 14k+ Verification projects face challenges similar to those in large software projects: maintenance and enforcement of coding conventions How to name lemmas? 1 / 28

Motivation: Hard-coded Naming Conventions CONTRIBUTIONS.md in MathComp, 50+ entries 2 / 28

Motivation: Many Inconsistencies in Large Projects 3 / 28

Motivation: Manually Checking and Enforcing 4 / 28

Our Contributions Roosterize : toolchain for learning and suggesting lemma names Code review process Interactive development Batch mode 5 / 28

Our Contributions Roosterize : toolchain for learning and suggesting lemma names Code review process Interactive development Batch mode Novel generation models based on multi-input encoder-decoder neural networks leveraging elaborated terms 5 / 28

Our Contributions Roosterize : toolchain for learning and suggesting lemma names Code review process Interactive development Batch mode Novel generation models based on multi-input encoder-decoder neural networks leveraging elaborated terms A corpus of 164k LOC high quality Coq code 5 / 28

Our Contributions Roosterize : toolchain for learning and suggesting lemma names Code review process Interactive development Batch mode Novel generation models based on multi-input encoder-decoder neural networks leveraging elaborated terms A corpus of 164k LOC high quality Coq code An extensive evaluation on our corpus via automated metrics 5 / 28

Our Contributions Roosterize : toolchain for learning and suggesting lemma names Code review process Interactive development Batch mode Novel generation models based on multi-input encoder-decoder neural networks leveraging elaborated terms A corpus of 164k LOC high quality Coq code An extensive evaluation on our corpus via automated metrics A qualitative case study on a project outside corpus 5 / 28

Running Example: A Lemma from reglang Project A lemma from a project on the theory of regular languages M ost g eneral classifiers can be casted to eq uivalent languages Lemma mg_eq_proof L1 L2 (N1 : mgClassifier L1) : L1 =i L2 -> nerode L2 N1. Proof. move=> eq_L u v. split=> [/nerodeP eq_in w|eq_in]. - by rewrite -!eq_L. - apply/nerodeP=> w. by rewrite !eq_L. Qed. 6 / 28

Running Example: A Lemma from reglang Project A lemma from a project on the theory of regular languages M ost g eneral classifiers can be casted to eq uivalent languages Lemma name Lemma mg_eq_proof L1 L2 (N1 : mgClassifier L1) : L1 =i L2 -> nerode L2 N1. Proof. move=> eq_L u v. split=> [/nerodeP eq_in w|eq_in]. - by rewrite -!eq_L. - apply/nerodeP=> w. by rewrite !eq_L. Qed. 6 / 28

Running Example: A Lemma from reglang Project A lemma from a project on the theory of regular languages M ost g eneral classifiers can be casted to eq uivalent languages Lemma statement Lemma mg_eq_proof L1 L2 (N1 : mgClassifier L1) : L1 =i L2 -> nerode L2 N1. Proof. move=> eq_L u v. split=> [/nerodeP eq_in w|eq_in]. - by rewrite -!eq_L. - apply/nerodeP=> w. by rewrite !eq_L. Qed. 6 / 28

Running Example: A Lemma from reglang Project A lemma from a project on the theory of regular languages M ost g eneral classifiers can be casted to eq uivalent languages Proof script Lemma mg_eq_proof L1 L2 (N1 : mgClassifier L1) : L1 =i L2 -> nerode L2 N1. Proof. move=> eq_L u v. split=> [/nerodeP eq_in w|eq_in]. - by rewrite -!eq_L. - apply/nerodeP=> w. by rewrite !eq_L. Qed. 6 / 28

Roosterize Toolchain Lemma mg_eq_proof L1 L2 (N1 : mgClassifier L1) : L1 =i L2 -> nerode L2 N1. Lemma statement 1 parsing Syntax tree 7 / 28

Roosterize Toolchain Lemma mg_eq_proof L1 L2 (N1 : mgClassifier L1) : L1 =i L2 -> nerode L2 N1. Lemma statement 1 parsing Syntax tree 2 elaboration Kernel tree (elaborated terms) 7 / 28

Roosterize Toolchain Lemma mg_eq_proof L1 L2 (N1 : mgClassifier L1) : L1 =i L2 -> nerode L2 N1. Lemma statement 1 parsing Syntax tree 3 tree chopping 2 elaboration Kernel tree (elaborated terms) 7 / 28

Roosterize Toolchain Lemma mg_eq_proof L1 L2 (N1 : mgClassifier L1) : L1 =i L2 -> nerode L2 N1. Lemma statement 1 4 parsing Syntax tree Multi-input 3 encoder-decoder Lemma name tree chopping neural network 2 elaboration Kernel tree Suggested : mg eq nerode (elaborated terms) 7 / 28

Model Input: Lemma Statement Lemma mg_eq_proof L1 L2 (N1 : mgClassifier L1) : L1 =i L2 -> nerode L2 N1. (Sentence((IDENT Lemma)(IDENT mg_eq_proof)(IDENT L1)(IDENT L2) S-expression (KEYWORD"(")(IDENT N1)(KEYWORD :)(IDENT mgClassifier) (IDENT L1)(KEYWORD")")(KEYWORD :)(IDENT L1)(KEYWORD =i)(IDENT L2) (KEYWORD ->)(IDENT nerode)(IDENT L2)(IDENT N1)(KEYWORD .))) In lexing phase Surface syntax level information 8 / 28

Model Input: Syntax Tree Lemma mg_eq_proof L1 L2 (N1 : mgClassifier L1) : L1 =i L2 -> nerode L2 N1. (VernacExpr()(VernacStartTheoremProof Lemma (Id mg_eq_proof) (((CLocalAssum(Name(Id L1))(CLocalAssum(Name(Id L2))) (CLocalAssum(Name(Id N1))(CApp(CRef(Ser_Qualid(DirPath())(Id mgClassifier))) (CRef(Ser_Qualid(DirPath())(Id L1)))))) (CNotation(InConstrEntrySomeLevel"_ -> _") (CNotation(InConstrEntrySomeLevel"_ =i _") (CRef(Ser_Qualid(DirPath())(Id L1)))(CRef(Ser_Qualid(DirPath())(Id L2)))) (CApp(CRef(Ser_Qualid(DirPath())(Id nerode))) (CRef(Ser_Qualid(DirPath())(Id L2)))(CRef(Ser_Qualid(DirPath())(Id N1)))))))) In parsing phase Surface syntax level information 9 / 28

Model Input: Kernel Tree Lemma mg_eq_proof L1 L2 (N1 : mgClassifier L1) : L1 =i L2 -> nerode L2 N1. (Prod (Name (Id char)) ... (Prod (Name (Id L1)) ... (Prod (Name (Id L2)) ... (Prod (Name (Id N1)) ... (Prod Anonymous (App (Ref (DirPath ((Id ssrbool) (Id ssr) (Id Coq))) (Id eq_mem)) ... (Var (Id L1)) ... (Var (Id L2))) (App (Ref (DirPath ((Id myhill_nerode) (Id RegLang))) (Id nerode)) ... (Var (Id L2)) ... (Var (Id N1)))))))) In elaboration phase Semantic level information 10 / 28

Model Input: Kernel Tree Lemma mg_eq_proof L1 L2 (N1 : mgClassifier L1) : L1 =i L2 -> nerode L2 N1. (Prod (Name (Id char)) ... (Prod (Name (Id L1)) ... (Prod (Name (Id L2)) ... (Prod (Name (Id N1)) ... (Prod Anonymous (App (Ref (DirPath ((Id ssrbool) (Id ssr) (Id Coq))) (Id eq_mem)) ... (Var (Id L1)) ... (Var (Id L2))) (App (Ref (DirPath ((Id myhill_nerode) (Id RegLang))) (Id nerode)) ... (Var (Id L2)) ... (Var (Id N1)))))))) In elaboration phase Semantic level information Add implicit terms 10 / 28

Model Input: Kernel Tree Lemma mg_eq_proof L1 L2 (N1 : mgClassifier L1) : L1 =i L2 -> nerode L2 N1. (Prod (Name (Id char)) ... (Prod (Name (Id L1)) ... (Prod (Name (Id L2)) ... (Prod (Name (Id N1)) ... (Prod Anonymous (App (Ref (DirPath ((Id ssrbool) (Id ssr) (Id Coq))) (Id eq_mem)) ... (Var (Id L1)) ... (Var (Id L2))) (App (Ref (DirPath ((Id myhill_nerode) (Id RegLang))) (Id nerode)) ... (Var (Id L2)) ... (Var (Id N1)))))))) In elaboration phase Semantic level information Add implicit terms Translate operators to their kernel names 10 / 28

Lemma Naming as a Transduction Task Encoder-decoder neural network : specifically designed for transduction tasks (e.g., machine translation, summarization, question answering) output o 1 o 2 o n � EOS � lemma name state input · · · · · · lemma statement i 1 i 2 i m � BOS � syntax tree � BOS � : begin of sequence kernel tree encoder decoder � EOS � : end of sequence 11 / 28

Deep Generation of Coq Lemma Names Using Elaborated Terms Pengyu Nie - PowerPoint PPT Presentation

Deep Generation of Coq Lemma Names Using Elaborated Terms Pengyu Nie 1 , Karl Palmskog 2 , Junyi Jessy Li 1 , and Milos Gligoric 1 IJCAR 2020 1 The University of Texas at Austin 2 KTH Royal Institute of Technology Motivation: Verification

COQ DEVELOPMENT TEAM SESSION Coq Development Team Coq Workshop 2019 Portland Sep 8th, 2019

Coq Coq Codet! Towards a Verified Toolchain for Coq in MetaCoq Matthieu Sozeau . r 2 , Inria

The Coq Proof Script Visualiser (coq-psv) Coq Workshop 2020, Virtual Mario Frank

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Welcome! Org. Names Org. Names Org. Names Org. Names Technical Set-up Denver Art

Welcome! Org. Names Org. Names Org. Names Org. Names TFGH Dave Ross GHC3 Robert Aaron

Experience Report: Smuggling a Little Bit of Coq Inside a CAD Development Context Dimitur Krustev

The Coq proof assistant : From graphical presentation to principles and practice Coq syntax

a Coq retrospective at the heart of Coq architecture the genesis of version 7.0

The Coq proof assistant : principles and practice J.-F. Monin Universit Grenoble Alpes 2016

The Pumping Lemma for Regular Languages The Pumping Lemma forRegular Languages p.1/39

Theory of Computer Science C4. Regular Languages: Pumping Lemma, Closure Properties and

Presentation Last Names A-E Ms. Kennair Last Names F-L Ms. Fornera Last Names M-R Ms. Tippins

Web Hosting and Domain Names Introduction to Web Design Web Hosting and Domain Names

Learning to Format Coq Code Using Language Models Pengyu Nie 1 , Karl Palmskog 2 , Junyi Jessy Li

Learning to Format Coq Code Using Language Models Pengyu Nie 1 , Karl Palmskog 2 , Junyi Jessy Li

Deformable Convolutional Networks Jifeng Dai^ With Haozhi Qi^, Yuwen Xiong^, Yi Li*^, Guodong

SimCLR: A Simple Framework for Contrastive Learning of Visual Representations Ting Chen Simon

Translator Research Production Shared Research task Dataset newstest2016 newstest2017

Structure at the meta-level: Observations on the structure of design spaces of high-performance

vil : Dri Drift ft with th De Devi Security of Multi-Sensor Fusion based Localization in

sample synthesis method for few-shot object recognition Eli Schwartz, Leonid Karlinsky,

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 9:

Biological Systems Hillel Kugler Faculty of Engineering, Bar-Ilan University, Israel FMCAD20

Deep Generation of Coq Lemma Names Using Elaborated Terms Pengyu Nie - PowerPoint PPT Presentation

Deep Generation of Coq Lemma Names Using Elaborated Terms Pengyu Nie 1 , Karl Palmskog 2 , Junyi Jessy Li 1 , and Milos Gligoric 1 IJCAR 2020 1 The University of Texas at Austin 2 KTH Royal Institute of Technology Motivation: Verification

COQ DEVELOPMENT TEAM SESSION Coq Development Team Coq Workshop 2019 Portland Sep 8th, 2019

Coq Coq Codet! Towards a Verified Toolchain for Coq in MetaCoq Matthieu Sozeau . r 2 , Inria

The Coq Proof Script Visualiser (coq-psv) Coq Workshop 2020, Virtual Mario Frank

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Welcome! Org. Names Org. Names Org. Names Org. Names Technical Set-up Denver Art

Welcome! Org. Names Org. Names Org. Names Org. Names TFGH Dave Ross GHC3 Robert Aaron

Experience Report: Smuggling a Little Bit of Coq Inside a CAD Development Context Dimitur Krustev

The Coq proof assistant : From graphical presentation to principles and practice Coq syntax

a Coq retrospective at the heart of Coq architecture the genesis of version 7.0

The Coq proof assistant : principles and practice J.-F. Monin Universit Grenoble Alpes 2016

The Pumping Lemma for Regular Languages The Pumping Lemma forRegular Languages p.1/39

Theory of Computer Science C4. Regular Languages: Pumping Lemma, Closure Properties and

Presentation Last Names A-E Ms. Kennair Last Names F-L Ms. Fornera Last Names M-R Ms. Tippins

Web Hosting and Domain Names Introduction to Web Design Web Hosting and Domain Names

Learning to Format Coq Code Using Language Models Pengyu Nie 1 , Karl Palmskog 2 , Junyi Jessy Li

Learning to Format Coq Code Using Language Models Pengyu Nie 1 , Karl Palmskog 2 , Junyi Jessy Li

Deformable Convolutional Networks Jifeng Dai^ With Haozhi Qi*^, Yuwen Xiong*^, Yi Li*^, Guodong

SimCLR: A Simple Framework for Contrastive Learning of Visual Representations Ting Chen Simon

Translator Research Production Shared Research task Dataset newstest2016 newstest2017

Structure at the meta-level: Observations on the structure of design spaces of high-performance

vil : Dri Drift ft with th De Devi Security of Multi-Sensor Fusion based Localization in

sample synthesis method for few-shot object recognition Eli Schwartz*, Leonid Karlinsky*,

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 9:

Biological Systems Hillel Kugler Faculty of Engineering, Bar-Ilan University, Israel FMCAD20

Deformable Convolutional Networks Jifeng Dai^ With Haozhi Qi^, Yuwen Xiong^, Yi Li*^, Guodong

sample synthesis method for few-shot object recognition Eli Schwartz, Leonid Karlinsky,