better glue for pipelines
play

Better Glue for Pipelines CSE504 Project Proposal Luheng He 1 - PowerPoint PPT Presentation

Better Glue for Pipelines CSE504 Project Proposal Luheng He 1 Motivation: Pipelined Software for NLP/ML Tasks (Mostly) task-independent, off-the-shelf tools Typical subtasks for NLP Typical subtasks for ML 1 Input Reader Input Reader


  1. Better Glue for Pipelines CSE504 Project Proposal Luheng He 1

  2. Motivation: Pipelined Software for NLP/ML Tasks (Mostly) task-independent, off-the-shelf tools Typical subtasks for NLP Typical subtasks for ML 1 Input Reader Input Reader Task-dependent code 2 Segmentation/tokenization Pre-processing/Data filtering 3 Pos-tagging/Parsing/Named-entity Recognition Glue 4 Feature Extraction for the target task Code 5 Parameter Fitting (Learning) 6 Evaluation/Cross validation 7 Model Ensemble 8 Output/Analysis/Visualization 2

  3. Can we automatically generate glue code? What’s wrong with glue code: ● Takes time to write, slows down research progress ● Boring and repetitive ● Error-prone ● … Automatically generate glue code: ● Focus on NLP/ML pipelines for now ● Focus on the case where we need to transform the output data from an upstream software A to the input of a downstream task B 3

  4. Code (Data structure, Sample Specification/Comments: API): input/output: /* output format = word_id \t word \t parent_id \t label class ParsedSentence { */ int[] tokenIds; 1 the 2 DT ... /* input format = int[] depParents; 2 cat 3 NN … parent_id,child_id,label_id */ …. 3 sits 0 VB ... Formal representation and invariants for the data: tokenIds: List[Int], parseTreeArcs: List[(Int, Int)] … ∀ t ∈ tokenIds: 0 ≤ t ≤ numWords, ∀ (x,y) ∈ parseTreeArcs: 0 ≤ x, y ≤ |tokenIds| ... Glue code Tests Specifications that transformat output data from based on the invariants that explains the software A to the input data of input/output format software B 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend