Tracking the Flow of Ideas through the Programming Languages - - PowerPoint PPT Presentation

tracking the flow of ideas through the programming
SMART_READER_LITE
LIVE PREVIEW

Tracking the Flow of Ideas through the Programming Languages - - PowerPoint PPT Presentation

Tracking the Flow of Ideas through the Programming Languages Literature Michael Greenberg, Kathleen Fisher, and David Walker How can we understand the PL literature? 2 Alexandre Duret-Lutz Is there more related work should I cite? Is my


slide-1
SLIDE 1

Tracking the Flow of Ideas through the Programming Languages Literature

Michael Greenberg, Kathleen Fisher, and David Walker

slide-2
SLIDE 2

2

How can we understand 
 the PL literature?

Alexandre Duret-Lutz

slide-3
SLIDE 3

Was this a typical year for ICFP?

3

Is there more related work should I cite? Is my work a better fit for PLDI or POPL? How has OOPSLA changed over the years? Who should review this paper? Who should I invite to this PC?

slide-4
SLIDE 4

4

Types Optimization Verification Synthesis Abstract
 Interpretation

slide-5
SLIDE 5

What is a ‘topic’ in a document?

5

Word Count type 120 system 83 check 34 static 21

slide-6
SLIDE 6

Topics are distributions of words

6

“Parsing” topic Word Log likelihood grammar lan

  • 3.905040

language

  • 4.206531

structure

  • 4.308618

parser

  • 4.513348

… …

slide-7
SLIDE 7

Documents are a mix of topics

7

type systems Word Count type 120 system 83 check 34 static 21

  • bject-orientation

Word Count

  • bject

88 class 13 instance 12 method 7

  • perational semantics

Word Count semantics 90 step 45 reduce 38 evaluate 19

.6 .28 .22

slide-8
SLIDE 8

Documents are a mix of topics

8

<.6,.28,.22>

  • bject-orientation

type systems

  • perational semantics
slide-9
SLIDE 9

9

Takikawa, Strickland, Dimoulas, Tobin-Hochstadt, and Felleisen Gradual typing for first-class classes. OOPSLA 2012.

Generative LDA topic model

slide-10
SLIDE 10

Inference with LDA

10

LDA-C* k

N bags of words N vectors, k-dimensional space k topics

*http://www.cs.princeton.edu/~blei/lda-c/

v1 vN ……

slide-11
SLIDE 11

11

corpus N docs post k top words k top papers aggregate vectors by year by conference parse N bags of words combined vocabulary k topic names by hand LDA-C k N vectors k topics v1 vN … …

slide-12
SLIDE 12

Parsing

  • Parsing drops standard stopwords
  • Added some extra ones with TF-IDF

  • Stemmed words using nltk*
  • Removes plurals, etc.

12

a about above after again against … calculi ➞ calculus goes ➞ go *http://www.nltk.org/

slide-13
SLIDE 13

Our corpora

  • Abstracts: ICFP, OOPSLA, PLDI, POPL
  • 4,355 documents
  • Imperfect data in the ACM Digital Library
  • Fulltext: PLDI, POPL
  • 2,257 documents
  • Imperfect PDF-to-text conversion

13

slide-14
SLIDE 14

Let’s name a topic!

  • bject

heap region memory pointer collector garbage collection allocation reference

14

Space overhead bounds for dynamic memory management with partial compaction Schism: fragmentation-tolerant real-time garbage collection Portable, unobtrusive garbage collection for multiprocessor systems Limitations of partial compaction: towards practical bounds Correctness-preserving derivation of concurrent garbage collection algorithms The ramifications of sharing in data structures A general framework for certifying garbage collectors and their mutators Beltway: getting around garbage collection gridlock On bounding time and space for multiprocessor garbage collection Garbage collection without paging

Garbage collection!

slide-15
SLIDE 15

15

Topic names for k=20, abstracts

Compiler

  • ptimization

Array Processing Verification Program Logics Resource management Garbage Collection Test generation Parallelism Parsing Components and APIs Object-Oriented Programming Language Design Low-level compiler

  • ptimizations

Program Analysis Analysis of Concurrent Programs Models and Modeling Semantics of concurrent programs Type Systems Applications Object-oriented software development

slide-16
SLIDE 16

16

slide-17
SLIDE 17

17

Compiler optimization Resource management Parsing Low−level compiler optimizations Semantics of concurrent programs Array Processing Garbage Collection Components and APIs Program Analysis Type Systems Verification Test generation Object−Oriented Programming Analysis of Concurrent Programs Applications Program Logics Parallelism Language Design Models and Modeling Object−oriented software development 10 20 30 10 20 30 10 20 30 10 20 30 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 Year Weight Conference ICFP OOPSLA PLDI POPL

How has OOPSLA changed over the years? Did changing the CfP change things? What about becoming part of SPLASH!?

slide-18
SLIDE 18

OOPSLA Call for Papers

18

2006 2007 2010

foundations of object and related technologies paradigms beyond the traditional concept of object-

  • riented programming

all aspects of programming languages and software engineering, broadly construed

slide-19
SLIDE 19

19

CfP SPLASH! CfP SPLASH!

slide-20
SLIDE 20

20

Compiler optimization Resource management Parsing Low−level compiler optimizations Semantics of concurrent programs Array Processing Garbage Collection Components and APIs Program Analysis Type Systems Verification Test generation Object−Oriented Programming Analysis of Concurrent Programs Applications Program Logics Parallelism Language Design Models and Modeling Object−oriented software development 10 20 30 10 20 30 10 20 30 10 20 30 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 Year Weight Conference ICFP OOPSLA PLDI POPL

What trends are visible in program verification
 across the decades?

slide-21
SLIDE 21

21

Program Logics

10 20 30 1980 1990 2000 2010

Conference ICFP OOPSLA PLDI POPL

slide-22
SLIDE 22

22

Compiler optimization Resource management Parsing Low−level compiler optimizations Semantics of concurrent programs Array Processing Garbage Collection Components and APIs Program Analysis Type Systems Verification Test generation Object−Oriented Programming Analysis of Concurrent Programs Applications Program Logics Parallelism Language Design Models and Modeling Object−oriented software development 10 20 30 10 20 30 10 20 30 10 20 30 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 Year Weight Conference ICFP OOPSLA PLDI POPL

How has PLDI changed over time? Per “Future of PLDI” session in Edinburgh,
 what is the state of the community?

slide-23
SLIDE 23

23

Conference ICFP OOPSLA PLDI POPL

Low−level compiler optimizations

10 20 30 1980 1990 2000 2010

slide-24
SLIDE 24

24

Topic names for k=20, full text

Data-driven

  • ptimization

Abstract interpretation Object-

  • rientation

Code generation Data-structure correctness Languages and control Security and bugfinding Processes and message passing Garbage collection Parallelization Program transformation Dynamic analysis Low-level systems Design Program analysis Proofs and models Register allocation Types Concurrency Parsing

slide-25
SLIDE 25

25

Data−driven optimization Data−structure correctness Garbage collection Low−level systems Register allocation Abstract interpretation Languages and control Parallelization Design Types Object−orientation Security and bugfinding Program transformation Program analysis Concurrency Code generation Processes and message passing Dynamic analysis Proofs and models Parsing 250 500 750 1000 250 500 750 1000 250 500 750 1000 250 500 750 1000 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 Year Weight Conference PLDI POPL

How has PLDI changed over time? Let’s compare PLDI and POPL,
 using our fulltext corpus.

slide-26
SLIDE 26

26

slide-27
SLIDE 27

27

Data−driven optimization Data−structure correctness Garbage collection Low−level systems Register allocation Abstract interpretation Languages and control Parallelization Design Types Object−orientation Security and bugfinding Program transformation Program analysis Concurrency Code generation Processes and message passing Dynamic analysis Proofs and models Parsing 250 500 750 1000 250 500 750 1000 250 500 750 1000 250 500 750 1000 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 Year Weight Conference PLDI POPL

Are there topics that used to be
 well represented in POPL?

slide-28
SLIDE 28

28

slide-29
SLIDE 29

29

Data−driven optimization Data−structure correctness Garbage collection Low−level systems Register allocation Abstract interpretation Languages and control Parallelization Design Types Object−orientation Security and bugfinding Program transformation Program analysis Concurrency Code generation Processes and message passing Dynamic analysis Proofs and models Parsing 250 500 750 1000 250 500 750 1000 250 500 750 1000 250 500 750 1000 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 Year Weight Conference PLDI POPL

What topics are in POPL
 but not really in PLDI?

slide-30
SLIDE 30

30

slide-31
SLIDE 31

Comparing documents

Are papers with close topic vectors related? Measure distance using Symmetrized KL divergence, which gives less weight to dimensions with small magnitude.

31

v1 v2 d

slide-32
SLIDE 32

32

5 10 15 CDRS PCC SEMC TAL

Paper Distance

Paper set Citations Random 1 Random 2 Random 3 Random 4 Random 5

slide-33
SLIDE 33

http://tmpl.weaselhat.com

33

slide-34
SLIDE 34

Ideas and plans

Beginning of a new project What do you think we should do? Models for researchers

34

v1 vN … …

slide-35
SLIDE 35

Limitations/problems

  • ACM DL is missing data
  • No programmatic access
  • Unclear choices about models
  • Abstracts or fulltext? k=20? k=30? k=200?
  • Which documents should ‘seed’ LDA?

35

slide-36
SLIDE 36

36

Data−driven optimization Data−structure correctness Garbage collection Low−level systems Register allocation Abstract interpretation Languages and control Parallelization Design Types Object−orientation Security and bugfinding Program transformation Program analysis Concurrency Code generation Processes and message passing Dynamic analysis Proofs and models Parsing

10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50

Fulltext paper rank (out of top 50) Abstract paper rank (out of top 50)

slide-37
SLIDE 37

(More) Questions?

37