UnnaturalNets Work by Joshua Campbell, Eddie Antonio Santos, - PowerPoint PPT Presentation

The Story of Ada 100,000s of crash reports per day PartyCrasher Crash database 4 of 44

Example: Mozilla • more than 2 million crash reports per week! • Manual bucketing @ 1 crash/minute: ◦ 913 Full-time employees! 5 of 44

What We Want oopsie bug annoyance Goal: Group the crashes together in buckets by what caused them whoops random crash 6 of 44

Realism! oopsie bug annoyance whoops Time whoops regression random crash 10 of 44

How good is a solution? • How do we measure correctness? • BCubed precision and recall! • Why not just normal precision and recall? • The solutions just put crashes together in buckets, ◦ doesn’t say what bugs exist (or even how many bugs exist) 11 of 44

High BCubed Precision oopsie bug annoyance whoops Time whoops regression random crash 12 of 44

High BCubed Recall oopsie bug annoyance whoops Time whoops regression random crash 13 of 44

Balanced BCubed P/R oopsie bug annoyance whoops Time whoops regression random crash 14 of 44

But does it scale? • We want it now! ◦ ( n log n total time or log n time per crash) ◦ Classical clustering algorithms are n 2 total time ◦ 2 million/week 15 of 44

Online oopsie bug annoyance Future Past whoops Time whoops regression random crash 16 of 44

Don’t want to hire devs • Doesn’t require developers to categorize crashes ◦ unsupervised 17 of 44

Non-stationary oopsie bug increase crash rate? Future Past Time whoops random crash new bucket? 18 of 44

In Practice: Mozilla • “Signature Generation” • Fast! • Accurate? 19 of 44

In Practice: Others • Mozilla, Microsoft (WER), Apple, Google... • Typically involve LOTS of hand-written rules 20 of 44

In Literature • A bunch of methods that are n 2 time complexity (or worse) ◦ take at least time proportional to n to sort one crash 21 of 44

In Literature • Lerch, et al. ◦ Not designed for crash report deduplication! ◦ Uses Lucene search engine find similar documents (bugs) 27 of 44

Lucene search Based on a standard textbook IR technique called TF-IDF plus some adjustments ↑↑↑ words in this document (crash) ↑↑↑ ↓↓↓ words in every document (crash) ↓↓↓ • the, be, to, of, and, a, in ... 28 of 44

In Literature • Lerch, et al. ◦ Let’s try that, but instead of trying to group bugs together, let’s group crashes! 29 of 44

Let’s Add Context evince crashed with SIGSEGV in cairo_transform() This happens immediately when trying to mark text with the mouse. ProblemType: Crash Architecture: amd64 DistroRelease: Ubuntu 7.10 ExecutablePath: /usr/bin/evince Package: evince 0.9.0-1ubuntu4 PackageArchitecture: amd64 ProcCmdline: evince ./expenses-uds-sevilla.pdf Signal: 11 SourcePackage: evince Uname: Linux donald 2.6.20-15-generic #2 SMP 30 of 44

In Literature • Lerch, et al. ◦ Requires breaking up things (bugs, crashes) into “words” 31 of 44

Tokenization: Lerch evince crashed with SIGSEGV in cairo_transform() #0 0x00002b34461e4dd1 in cairo_transform () from /usr/lib/libcairo.so.2 #1 0x00002b344498a150 in CairoOutputDev::setDefaultCTM () from /usr/lib/libpoppler-glib.so.1 #2 0x00002b344ae2cefc in TextSelectionPainter::TextSelectionPainter () from /usr/lib/libpoppler.so.1 32 of 44

Tokenization: Space evince crashed with SIGSEGV in cairo_transform() #0 0x00002b34461e4dd1 in cairo_transform () from /usr/lib/libcairo.so.2 #1 0x00002b344498a150 in CairoOutputDev::setDefaultCTM () from /usr/lib/libpoppler-glib.so.1 #2 0x00002b344ae2cefc in TextSelectionPainter::TextSelectionPainter () from /usr/lib/libpoppler.so.1 33 of 44

Tokenization: CamelCase evince crashed with SIGSEGV in cairo_transform() #0 0x00002b34461e4dd1 in cairo_transform () from /usr/lib/libcairo.so.2 #1 0x00002b344498a150 in CairoOutputDev::setDefaultCTM () from /usr/lib/libpoppler-glib.so.1 #2 0x00002b344ae2cefc in TextSelectionPainter::TextSelectionPainter () from /usr/lib/libpoppler.so.1 34 of 44

Tokenization Lerch glib setdefaultctm 0x00002b344498a150 Space libpoppler 0x00002b344498a150 from cairooutputdev from in #1 () /usr/lib/libpoppler-glib.so.1 CairoOutputDev::setDefaultCTM Camel set usr lib 150 glib so 1 x 0 1 Output Dev CTM in Cairo 344498 00002 a Default from b libpoppler 35 of 44

Results • Ok so who won? 36 of 44

Best F1 Best Recall CamelC Lerch SpaceC 1Frame Best Precision 1Mod 2Frame 1Addr 1File 3Frame 37 of 44

UnnaturalNets Work by Joshua Campbell, Eddie Antonio Santos, - PowerPoint PPT Presentation

UnnaturalNets Work by Joshua Campbell, Eddie Antonio Santos, Nelson J Amaral, Joshua is On the Abram Hindle postdoc market! Sept 2018 Introduction Exploring Bimodal program analysis via a naturalness lens Syntax Error Detection

How Python Works 15-110 Friday 01/17 Learning Objectives Recognize the steps of the

Lecture 02 Algorithmic Thinking Prof. Katherine Gibson Prof. Jeremy Dixon Based on slides by

Automatically Repairing Input Data for Novice Python Programs Madeline Endres, University of

Reachability and error diagnosis in LR(1) automata Franois Pottier JFLA, Saint-Malo January

Compiler Construction Lecture 9: Practical parsing issues and yacc intro 2020-02-04 Michael

F u n c t i o n s R e t u r n V a l u e s Returns None Returns a

CS 105 Lecture 4: Functions and Conditionals Craig Zilles (Computer Science)

Syntactic Monoids in a Category CALCO 2015 Ji r Ad amek, Stefan Milius and Henning

Piecewise Testable Tree Languages Mikoaj Bojaczyk, Luc Segoufin, Howard Straubing is talk

On multisemigroups Ganna Kudryavtseva University of Ljubljana based on a joint work with

Formal Languages and Groups Newcastle Junior Algebra Seminar Graham Campbell School of

Descriptional Complexity of Pushdown Store Languages Andreas Malcher Katja Meckel Carlo

Automatic Verification of Polynomial Rings Fundamental Properties in ACL2 Inmaculada Medina

-defjnable functions and automata theory Nguyn L Thnh Dng (a.k.a. Tito)

Master Parisien de Recherche en Informatique Modle des langages de programmation Domaines,

Semantic Modularization Techniques in Practice: A TAPL case study Bruno C. d. S. Oliveira Joint

tt t tt

Concurrency Theory Winter Semester 2019/20 Lecture 8: The -Calculus Joost-Pieter Katoen and

Linear Temporal Logic to Rewrite Propositions Towards a New Model-Checking Approach P.-C.

Logics for Data and Knowledge Representation 1. Introduction to First order logic Luciano

Frobenius Algebras and Classical Proof Nets Fran cois Lamarche and Novak Novakovi c LORIA

Introduction to Natural Language Syntax and Parsing: L95 Lecture 1: Introduction Ann Copestake

Introduction to Compiling Chapter 1 1 Compiler Construction Introduction to Compiling To Do

YACC Background ! Review : Recall grammars for YACC are a CSCI: 4500/6500 Programming variant of