from natural language specifications to program input
play

From Natural Language Specifications to Program Input Parsers Tao - PowerPoint PPT Presentation

From Natural Language Specifications to Program Input Parsers Tao Lei , Fan Long, Regina Barzilay, Martin Rinard CSAIL, MIT 1 Translating Natural Language to Input Parser Input Parser: Input Specification: Defines the format of input data


  1. From Natural Language Specifications to Program Input Parsers Tao Lei , Fan Long, Regina Barzilay, Martin Rinard CSAIL, MIT 1

  2. Translating Natural Language to Input Parser Input Parser: Input Specification: Defines the format of input data Part of a program that reads and stores data int n, r, x[], y[]; - The input starts with a line containing two integers n and r. Scanner scanner = new Scanner( new File( “input.txt” )); - This is followed by n lines, n = scanner.nextInt(); each containing two integers xi, r = scanner.nextInt(); yi, giving the coordinates of the x = new int [n]; polygon vertices. y = new int [n]; for ( int i = 0; i < n; i++) { x[i] = scanner.nextInt(); y[i] = scanner.nextInt(); } Two Input Examples: 4 10 3 6 -8 2 0 4 8 14 0 0 0 14 5 1 0 6 2

  3. Translating Natural Language to Input Parser Input Parser: Input Specification: Defines the format of input data Part of a program that reads and stores data int n, r, x[], y[]; - The input starts with a line containing two integers n and r. Scanner scanner = new Scanner( new File( “input.txt” )); - This is followed by n lines, n = scanner.nextInt(); each containing two integers xi, r = scanner.nextInt(); yi, giving the coordinates of the x = new int [n]; polygon vertices. y = new int [n]; for ( int i = 0; i < n; i++) { x[i] = scanner.nextInt(); y[i] = scanner.nextInt(); } Two Input Examples: 4 10 3 6 -8 2 0 4 8 14 0 0 Goal: generating input parser by reading natural language 0 14 5 1 0 6 3

  4. Motivation • Reading and processing data is a common task • Writing input parsers is mechanical, tedious and time-consuming MST dependency POS tagger data format data format This DT John ate an apple NN VB DT NN is VBZ a DT SUBJ ROOT MOD OBJ 2 0 4 2 short JJ sentence NN CONLL dependency . . The dog barks data format DT NN VB So RB MOD SUBJ ROOT 1 Cathy Cathy N N … 2 su is VBZ 2 3 0 2 zag zie V V … 0 ROOT this DT 3 hen hen Pron Pron … 2 obj1 4 wild wild Adj Adj … 5 mod 5 zwaaien zwaai N N … 2 vc 6 . . Punc Punc … 5 punct … 4

  5. Motivation • Reading and processing data is a common task • Writing input parsers is mechanical, tedious and time-consuming Input Specification: Input Example: 10 “The input is one integer abc xyz uvw followed by a list of strings.” efg … Parser Generator Allows natural language as (our model) the interface to specify input Input Parser (in C++, Java, …) 5

  6. Motivation • Reading and processing data is a common task • Writing input parsers is mechanical, tedious and time-consuming Input Specification: Input Example: 10 “The input is one integer abc xyz uvw followed by a list of strings.” efg … Parser Generator Allows natural language as (our model) the interface to specify input Input Parser (in C++, Java, …) Advantage: reducing programming effort and the chance of making code mistakes 6

  7. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Parser: Input Specification: sentence = [ ]; with open( “input . txt” ) as fin: The input consists of multiple sentences. line = fin.readline().strip(); while line: ? • The first line of each sentence is the list of if line != “” : words in the sentence; word = line.split(); • The second line of each sentence contains pos = fin.readline().split(); the POS tokens; label = fin.readline().split(); • parent = fin.readline().split(); The third line are dependency labels; parent = [ int (x) for x in parent ]; • The last line are integers representing the positions of each word’s parent. sentence.append( (word, pos, label, parent) ); line = fin.readline().strip(); 7

  8. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Example: John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ 2 0 4 2 The dog barks DT NN VB MOD SUBJ ROOT 2 3 0 … 8

  9. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Example: Input John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ 2 0 4 2 The dog barks DT NN VB MOD SUBJ ROOT 2 3 0 … 9

  10. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Example: Input John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ Sentences 2 0 4 2 The dog barks DT NN VB MOD SUBJ ROOT 2 3 0 … 10

  11. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Example: Input John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ Sentences 2 0 4 2 The dog barks Words DT NN VB MOD SUBJ ROOT 2 3 0 … 11

  12. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Example: Input John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ Sentences 2 0 4 2 POS The dog barks Words Tokens DT NN VB MOD SUBJ ROOT 2 3 0 … 12

  13. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Example: Input John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ Sentences 2 0 4 2 POS The dog barks Words Labels Tokens DT NN VB MOD SUBJ ROOT 2 3 0 … 13

  14. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser Input Example: Input John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ Sentences 2 0 4 2 POS Position The dog barks Words Labels Tokens Integers DT NN VB MOD SUBJ ROOT 2 3 0 … 14

  15. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser • Specification tree of nested input formats Input Example: Input John ate an apple NN VB DT NN SUBJ ROOT MOD OBJ Sentences 2 0 4 2 POS Position The dog barks Words Labels Tokens Integers DT NN VB MOD SUBJ ROOT 2 3 0 Specification Tree … 15

  16. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser • Specification tree of nested input formats Input Specification Specification Tree The input parser is deterministically  generated from the specification tree . Input Parser 16

  17. How to Translate NL to Input Parser? • Need an abstraction that connects NL and input parser • Specification tree of nested input formats Input Specification Specification Tree The input parser is deterministically  generated from the specification tree . Input Parser Focus: translating input specifications into specification trees 17

  18. How to Translate NL to Specification Tree? Input Specification Specification tree is a dependency tree  over noun phrases in the NL specification. Specification Tree Input Specification: Input The input consists of multiple sentences. • The first line of each parse is the list of Sentences words in the sentence; • The second line of each parse contains the POS tokens; POS Position • Words Labels The third line are dependency labels; Tokens Integers • The last line are integers representing the positions of each word’s parent. Task: translation as an NLP problem 18

  19. Learning Scenario Input N input specifications The input consists of a single test case. A 𝒙 = 𝑥 1 ,… , 𝑥 𝑂 test case consists of two lines. The first line contains an integer n indicating the number of molecule types. The second line contains n eight-character strings, each describing a single type of molecule, separated by single spaces. Each string consists of four two-character connector labels Input Example: Input Example: some input examples Input Example: 3 3 A+00A+A+ 00B+D+A- B-C+00C+ 3 for each specification A+00A+A+ 00B+D+A- B-C+00C+ A+00A+A+ 00B+D+A- B-C+00C+ No human annotation specification trees 𝒖 ~ 𝑄 𝒖 𝒙 𝒖 = 𝑢 1 ,… , 𝑢 𝑂 corresponding input parsers 19

  20. Learning Scenario Input N input specifications The input consists of a single test case. A 𝒙 = 𝑥 1 ,… , 𝑥 𝑂 test case consists of two lines. The first line contains an integer n indicating the number of molecule types. The second line contains n eight-character strings, each describing a single type of molecule, separated by single spaces. Each string consists of four two-character connector labels Input Example: Input Example: some input examples Input Example: 3 3 A+00A+A+ 00B+D+A- B-C+00C+ 3 for each specification A+00A+A+ 00B+D+A- B-C+00C+ A+00A+A+ 00B+D+A- B-C+00C+ No human annotation specification trees 𝒖 ~ 𝑄 𝒖 𝒙 𝒖 = 𝑢 1 ,… , 𝑢 𝑂 Idea : learning from feedback -- testing input parser on input examples corresponding input parsers 20

  21. Key Intuitions a correct tree should read all • Necessary but NOT sufficient condition • input examples successfully False-positive parsers a list of integers? 5 -8 a list of integer pairs? 8 0 0 … -8 a list of strings? Input Example Possible Interpretations Many input parsers can read the same input 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend