Emergent, Crowd-scale Programming Practice in the IDE Ethan Fast , - - PowerPoint PPT Presentation

emergent crowd scale programming practice in the ide
SMART_READER_LITE
LIVE PREVIEW

Emergent, Crowd-scale Programming Practice in the IDE Ethan Fast , - - PowerPoint PPT Presentation

Emergent, Crowd-scale Programming Practice in the IDE Ethan Fast , Daniel Ste ff ee, Lucy Wang, Michael Bernstein, Joel Brandt Stanford HCI, Adobe Research Emergent behaviors, or the ways people adapt to a system, can be just as informative


slide-1
SLIDE 1

Emergent, Crowd-scale Programming Practice in the IDE

Ethan Fast, Daniel Steffee, Lucy Wang, Michael Bernstein, Joel Brandt Stanford HCI, Adobe Research

slide-2
SLIDE 2

Emergent behaviors, or the ways people adapt to a system, can be just as informative as a system’s design.

slide-3
SLIDE 3

Many norms for programming systems aren’t codified in documentation or on the web.

slide-4
SLIDE 4

Developers can have unanswered questions

What is the best idiom or library to use for a certain kind of task? Does my code follow common practice? How is a language being used today?

slide-5
SLIDE 5

A Ruby Idiom

How does this code work? What is the block doing?

slide-6
SLIDE 6

A Ruby Idiom

How does this code work? What is the block doing?

Extracting an

  • ptions hash

from a function that takes any number of arguments

slide-7
SLIDE 7

Codex is a knowledge base that records emergent practice for the Ruby programming language.

slide-8
SLIDE 8

Codex normalizes code structure to identify common functions, blocks, and syntactic patterns.

slide-9
SLIDE 9

Codex enables new data-driven interfaces for programming

Detect unlikely code Create a living library Annotate common idioms

slide-10
SLIDE 10

Building the Codex Knowledge Base

Part 1: Building the Knowledge Base

slide-11
SLIDE 11

The goal: identify emergent patterns that good programmers would use

Part 1: Building the Knowledge Base

slide-12
SLIDE 12

Part 1: Building the Knowledge Base

Each record in the Codex knowledge base is an AST node

slide-13
SLIDE 13

Each record in the Codex knowledge base is an AST node

Part 1: Building the Knowledge Base

novels.map { |title| title.downcase + “!” } movies.map { |name| name.downcase + “?” }

Are these snippets equivalent?

slide-14
SLIDE 14

# Snippet 2 chi_hash = Hash.new do |h,k| h[k] = {} end chi_hash[:CHI][“2014”] = “Toronto” # Snippet 1 uist_hash = Hash.new do |hash,key| hash[key] = {} end my_hash[:UIST][“2014”] = “Hawaii”

Part 1: Building the Knowledge Base

slide-15
SLIDE 15

# Snippet 2 chi_hash = Hash.new do |h,k| h[k] = {} end chi_hash[:CHI][“2014”] = “Toronto” # Snippet 1 uist_hash = Hash.new do |hash,key| hash[key] = {} end my_hash[:UIST][“2014”] = “Hawaii”

Part 1: Building the Knowledge Base

slide-16
SLIDE 16

# Snippet 2 var0 = Hash.new do |var1,var2| var1[var2] = {} end var0[:CHI][“2014”] = “Toronto” # Snippet 1 var0 = Hash.new do |var1,var2| var1[var2] = {} end var0[:UIST][“2014”] = “Hawaii”

Part 1: Building the Knowledge Base

slide-17
SLIDE 17

# Snippet 2 var0 = Hash.new do |var1,var2| var1[var2] = {} end var0[:CHI][“2014”] = “Toronto” # Snippet 1 var0 = Hash.new do |var1,var2| var1[var2] = {} end var0[:UIST][“2014”] = “Hawaii”

Part 1: Building the Knowledge Base

slide-18
SLIDE 18

# Snippet 2 var0 = Hash.new do |var1,var2| var1[var2] = {} end var0[:SYM0][“2014”] = “Toronto” # Snippet 1 var0 = Hash.new do |var1,var2| var1[var2] = {} end var0[:SYM0][“2014”] = “Hawaii”

Part 1: Building the Knowledge Base

slide-19
SLIDE 19

# Snippet 2 var0 = Hash.new do |var1,var2| var1[var2] = {} end var0[:SYM0][“2014”] = “Toronto” # Snippet 1 var0 = Hash.new do |var1,var2| var1[var2] = {} end var0[:SYM0][“2014”] = “Hawaii”

Part 1: Building the Knowledge Base

slide-20
SLIDE 20

# Snippet 2 var0 = Hash.new do |var1,var2| var1[var2] = {} end var0[:SYM0][“STR0”] = “STR1” # Snippet 1 var0 = Hash.new do |var1,var2| var1[var2] = {} end var0[:SYM0][“STR0”] = “STR1”

Part 1: Building the Knowledge Base

slide-21
SLIDE 21

Statistical Linting

Part 2: Statistical Linting

slide-22
SLIDE 22

Statistical linting: detecting code that is unlikely to occur in practice

Part 2: Statistical Linting

slide-23
SLIDE 23

Codex observes var0 =

var1.downcase more than

200 times, but var0 =

var1.downcase!only 1

time.

Chaining & Composition

Warning: Line 3

Part 2: Statistical Linting

slide-24
SLIDE 24

Chaining & Composition

The function downcase! has a side-effect and changes name

Codex observes var0 =

var1.downcase more than

200 times, but var0 =

var1.downcase!only 1

time.

Warning: Line 3

Part 2: Statistical Linting

slide-25
SLIDE 25

Unlikely variable names

Codex observes variables named array 116 times and variables assigned a

Hash value 1248 times, but

has never seen the two together.

Warning: Line 2

Part 2: Statistical Linting

slide-26
SLIDE 26

Unlikely variable names

You might wonder: does an Array really have a method named keys?

Warning: Line 2

Codex observes variables named array 116 times and variables assigned a

Hash value 1248 times, but

has never seen the two together.

Part 2: Statistical Linting

slide-27
SLIDE 27

Other kinds of analysis

Function chains Function types Block return values

Part 2: Statistical Linting

slide-28
SLIDE 28

var0.split.to_s .split .to_s

“Function split has appeared 29 times and to_s has appeared 12 times, but they’ve never been chained together.” Used 0 times Used 12 times Used 29 times

var0.split.to_s #=> Error: Array => String

slide-29
SLIDE 29

Pattern Annotation

Part 3: Pattern Annotation

slide-30
SLIDE 30

Pattern annotation: finds common idioms, then annotates them using crowds

Part 3: Pattern Annotation

slide-31
SLIDE 31

Query for snippets with sufficient commonality and complexity

mongo_query = { project_count: { gt: .02 }, total_count: { lt: 0.9 }, file_count: { lt: 0.2 }, token_count: { lt: 0.8 }, function_count: { gt: 2.0 } }

Part 3: Pattern Annotation

slide-32
SLIDE 32

Next we crowdsource a title, description, and vote of usefulness from oDesk workers

Part 3: Pattern Annotation

slide-33
SLIDE 33

Nested Hashes

Creating a Nested Hash

Creates a Hash with a new empty Hash object as a default key value

Total count: 66 Project count: 10

Part 3: Pattern Annotation

slide-34
SLIDE 34

Nested Hashes

Creating a Nested Hash

Creates a Hash with a new empty Hash object as a default key value

Total count: 66 Project count: 10

This simple idiom is easy to mess up!

Part 3: Pattern Annotation

slide-35
SLIDE 35

Configure Rails Caching

Configure Rails Caching

By setting this to false, you can turns off caching for the Rails web framework

Total count: 78 Project count: 34

Part 3: Pattern Annotation

slide-36
SLIDE 36

Raise StandardError

Raise Custom Error

Raise a new StandardError using a custom message, passed as a string value

Total count: 66 Project count: 10

Part 3: Pattern Annotation

slide-37
SLIDE 37

Library Generation

Part 4: Library Generation

slide-38
SLIDE 38

Library generation constructs a utility package that reflects common practice

Part 4: Library Generation

slide-39
SLIDE 39

String#capital_tokens

Capitalize each word token in a string

This idiom occurred 10 times across 5 different projects.

Part 4: Library Generation

slide-40
SLIDE 40

Hash##nested

Create a helper method for nested Hashes

This idiom occurred 66 times across 12 different projects.

Part 4: Library Generation

slide-41
SLIDE 41

Evaluation

Part 5: Evaluation

slide-42
SLIDE 42

Hit-rate after 500k LOC

Part 5: Evaluation: Knowledge Base

slide-43
SLIDE 43

9% 14% 76%

Standard Library External Library Data / Control Flow Part 5: Evaluation: Pattern Annotation

Snippet categories

slide-44
SLIDE 44

Part 5: Evaluation: Pattern Annotation

A survey of expert crowdworkers

86% of snippets are useful 91% have no more common form 96% are recomposable

slide-45
SLIDE 45

Statistical linting and false positives

Part 5: Evaluation: Statistical Linting

We find 1,248 warnings over 49,735 lines, a rate of 2.5%.

slide-46
SLIDE 46

Common false positives

Part 5: Evaluation: Statistical Linting

slide-47
SLIDE 47

Ambiguous false positives

Part 5: Evaluation: Statistical Linting

slide-48
SLIDE 48

Conclusion

slide-49
SLIDE 49

Mining emergent practice can support a broad set of software engineering interfaces

slide-50
SLIDE 50

Programming languages can be living artifacts

Libraries self-update to the latest idioms IDEs offer suggestions to suit new coding styles Languages evolve to better support their users

slide-51
SLIDE 51
slide-52
SLIDE 52

Emergent, Crowd-scale Programming Practice in the IDE

Ethan Fast, Daniel Steffee, Lucy Wang, Michael Bernstein, Joel Brandt Stanford HCI, Adobe Research

slide-53
SLIDE 53

Extra Slides

slide-54
SLIDE 54

Conventions emerge among many different kinds of domains.

Writing Photography Research Programming Design Presentations …

slide-55
SLIDE 55

Chaining & Composition

slide-56
SLIDE 56

Chaining & Composition

The function downcase! has a side- effect and changes name

slide-57
SLIDE 57

Chaining & Composition

Codex observes var0 = var1.downcase more than 200 times, but var0 = var1.downcase! only 1 time.

The function downcase! has a side- effect and changes name

slide-58
SLIDE 58

Unlikely variable names

slide-59
SLIDE 59

Unlikely variable names

You might wonder: does an Array really have a method named keys?

slide-60
SLIDE 60

Unlikely variable names

Codex observes variables named array 116 times and variables assigned a Hash value many thousands of times, but we never see the two together.

You might wonder: does an Array really have a method named keys?

slide-61
SLIDE 61

Nested Hashes

slide-62
SLIDE 62

Nested Hashes

Assigns an empty Hash as the default key value

slide-63
SLIDE 63

Nested Hashes

This simple idiom is easy to mess up!

Assigns an empty Hash as the default key value

slide-64
SLIDE 64

Turn off Rails Caching

Turning off default caching for the Rails web framework

slide-65
SLIDE 65

Raise StandardError

Raise a new StandardError message using a custom message

slide-66
SLIDE 66

Data mining for Codex

  • 1. Gather Ruby code from Github
  • 4. Collapse normalized ASTs
  • 2. Parse the code into AST representation
  • 3. Normalize the ASTs (rename variables,

strings, symbols, and numbers)

slide-67
SLIDE 67

Data mining for Codex

  • 1. Gather Ruby code from Github
  • 4. Collapse normalized ASTs
  • 2. Parse the code into AST representation
  • 3. Normalize the ASTs (rename variables,

strings, symbols, and numbers)

slide-68
SLIDE 68

An AST node s must

  • ccur fewer than t

times, and its children ci must occur more then ti times

E.g., the snippet var0.split.to_s is composed

  • f .split and .to_s

Part 2: Statistical Linting