The State of Naming Conventions in R Rasmus Bth - - PowerPoint PPT Presentation

the state of naming conventions in r
SMART_READER_LITE
LIVE PREVIEW

The State of Naming Conventions in R Rasmus Bth - - PowerPoint PPT Presentation

The State of Naming Conventions in R Rasmus Bth rasmus.baath@lucs.lu.se Lund University Cognitive Science The only real difficulties in programming are cache invalidation and naming things. -- Phil Karlton Outline In the R ecosystem


slide-1
SLIDE 1

The State of Naming Conventions in R

Rasmus Bååth rasmus.baath@lucs.lu.se Lund University Cognitive Science

slide-2
SLIDE 2

The only real difficulties in programming are cache invalidation and naming things.

  • - Phil Karlton
slide-3
SLIDE 3

Outline

  • In the R ecosystem many different

naming conventions are used.

slide-4
SLIDE 4

Outline

  • In the R ecosystem many different

naming conventions are used.

  • This is not a good thing.
slide-5
SLIDE 5

Outline

  • In the R ecosystem many different

naming conventions are used.

  • This is not a good thing.
  • How to deal with the current naming

convention situation.

slide-6
SLIDE 6
slide-7
SLIDE 7

Different Naming Conventions used in R

  • alllowercase

○ searchpaths, srcfilecopy

slide-8
SLIDE 8

Different Naming Conventions used in R

  • alllowercase

○ searchpaths, srcfilecopy

  • period.separated

○ as.numeric, read.table

slide-9
SLIDE 9

Different Naming Conventions used in R

  • alllowercase

○ searchpaths, srcfilecopy

  • period.separated

○ as.numeric, read.table

  • underscore_separated

○ seq_along, package_version

slide-10
SLIDE 10

Different Naming Conventions used in R

  • alllowercase

○ searchpaths, srcfilecopy

  • period.separated

○ as.numeric, read.table

  • underscore_separated

○ seq_along, package_version

  • lowerCamelCase

○ colMeans, supressPackageStartupMessage

slide-11
SLIDE 11

Different Naming Conventions used in R

  • alllowercase

○ searchpaths, srcfilecopy

  • period.separated

○ as.numeric, read.table

  • underscore_separated

○ seq_along, package_version

  • lowerCamelCase

○ colMeans, supressPackageStartupMessage

  • UpperCamelCase

○ Vectorize, NextMethod

slide-12
SLIDE 12

Different Naming Conventions used in R

  • alllowercase

○ searchpaths, srcfilecopy

  • period.separated

○ as.numeric, read.table

  • underscore_separated

○ seq_along, package_version

  • lowerCamelCase

○ colMeans, supressPackageStartupMessage

  • UpperCamelCase

○ Vectorize, NextMethod

  • .OTHER_style

○ Cstack_info, Sys.setlocale, Sys.setFileTime

slide-13
SLIDE 13

Guess the naming convention for parameters of read.table!

period.separated lowerCamelCase strip white

? ?

slide-14
SLIDE 14

Guess the naming convention for parameters of read.table!

period.separated lowerCamelCase strip.white

✔ ✘

slide-15
SLIDE 15

Guess the naming convention for parameters of read.table!

period.separated lowerCamelCase strip.white

blank lines skip

? ✘ ?

slide-16
SLIDE 16

Guess the naming convention for parameters of read.table!

period.separated lowerCamelCase strip.white

blank.lines.skip

✔ ✘ ✘

slide-17
SLIDE 17

Guess the naming convention for parameters of read.table!

period.separated lowerCamelCase strip.white

blank.lines.skip

allow escapes

? ✘ ✘ ?

slide-18
SLIDE 18

Guess the naming convention for parameters of read.table!

period.separated lowerCamelCase strip.white

blank.lines.skip

allowEscapes

✔ ✘ ✘ ✘

slide-19
SLIDE 19

Guess the naming convention for parameters of read.table!

period.separated lowerCamelCase strip.white

blank.lines.skip

allowEscapes

col names

? ✘ ✘ ✘ ?

slide-20
SLIDE 20

Guess the naming convention for parameters of read.table!

period.separated lowerCamelCase strip.white

blank.lines.skip

allowEscapes

col.names

✔ ✘ ✘ ✘ ✘

slide-21
SLIDE 21

Guess the naming convention for parameters of read.table!

period.separated lowerCamelCase strip.white

blank.lines.skip

allowEscapes

col.names

col classes

? ✘ ✘ ✘ ✘ ?

slide-22
SLIDE 22

Guess the naming convention for parameters of read.table!

period.separated lowerCamelCase strip.white

blank.lines.skip

allowEscapes

col.names

colClasses

✔ ✘ ✘ ✘ ✘ ✘

slide-23
SLIDE 23

Guess the naming convention for parameters of read.table!

period.separated lowerCamelCase strip.white

blank.lines.skip

allowEscapes

col.names

colClasses

strings as factors

? ✘ ✘ ✘ ✘ ✘ ?

slide-24
SLIDE 24

Guess the naming convention for parameters of read.table!

period.separated lowerCamelCase strip.white

blank.lines.skip

allowEscapes

col.names

colClasses

stringsAsFactors

✔ ✘ ✘ ✘ ✘ ✘ ✘

slide-25
SLIDE 25

Unofficial naming conventions guidelines.

  • Bioconductor’s coding standards.

○ readTable, stringsAsFactors.

slide-26
SLIDE 26

Unofficial naming conventions guidelines.

  • Bioconductor’s coding standards.

○ readTable, stringsAsFactors.

  • Hadley Wickham’s style guide

○ read_table, strings_as_factors.

slide-27
SLIDE 27

Unofficial naming conventions guidelines.

  • Bioconductor’s coding standards.

○ readTable, stringsAsFactors.

  • Hadley Wickham’s style guide

○ read_table, strings_as_factors.

  • Colin Gillespie’s R style guide

○ ReadTable, strings_as_factors

slide-28
SLIDE 28

Unofficial naming conventions guidelines.

  • Bioconductor’s coding standards.

○ readTable, stringsAsFactors.

  • Hadley Wickham’s style guide

○ read_table, strings_as_factors.

  • Colin Gillespie’s R style guide

○ ReadTable, strings_as_factors

  • Google’s R style guide

○ ReadTable, strings.as.factors

slide-29
SLIDE 29

What naming conventions are used in practice?

  • Comprehensive R Archive Network to the

rescue!

slide-30
SLIDE 30

What naming conventions are used in practice?

  • Comprehensive R Archive Network to the

rescue!

  • I downloaded all (4411) packages on

CRAN.

slide-31
SLIDE 31

What naming conventions are used in practice?

  • Comprehensive R Archive Network to the

rescue!

  • I downloaded all (4411) packages on

CRAN.

  • Got 339032 parameter names and 76176

function names.

slide-32
SLIDE 32

What naming conventions are used in practice?

  • Comprehensive R Archive Network to the

rescue!

  • I downloaded all (4411) packages on

CRAN.

  • Got 339032 parameter names and 76176

function names.

  • Removed the class part of S3 functions,

(plot.mcmc -> plot).

slide-33
SLIDE 33

What naming conventions are used in practice?

  • Comprehensive R Archive Network to the

rescue!

  • I downloaded all (4411) packages on

CRAN.

  • Got 339032 parameter names and 76176

function names.

  • Removed the class part of S3 functions,

(plot.mcmc -> plot).

  • Counted how many of the functions and

parameters matched the different naming conventions.

slide-34
SLIDE 34
slide-35
SLIDE 35
slide-36
SLIDE 36
slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39

Why heterogenous naming conventions are a bad thing.

  • It is not aesthetically pleasing.
slide-40
SLIDE 40

Why heterogenous naming conventions are a bad thing.

  • It is not aesthetically pleasing.
  • It makes R harder to learn.
slide-41
SLIDE 41

A Memory Experiment

Mixed capitalization condition (n = 71) flood Critic victory basis deficit testing General alcohol Track profile Train equity All lower case condition (n=77) profile critic train general flood track alcohol victory equity testing basis deficit

slide-42
SLIDE 42

Participants remembered on average 1.2 ± 0.6 more words in the All lower case condition. (95% bootstrap CI)

slide-43
SLIDE 43

Why heterogenous naming conventions are a bad thing.

  • It is not aesthetically pleasing.
  • It makes R harder to learn
  • It makes R harder to use.
slide-44
SLIDE 44

Heterogenous naming conventions makes R harder to use.

  • It is practically impossible to follow
  • ne naming convention even if you try.
slide-45
SLIDE 45

Heterogenous naming conventions makes R harder to use.

  • It is practically impossible to follow
  • ne naming convention even if you try.
  • Easier to make errors.
slide-46
SLIDE 46

Heterogenous naming conventions makes R harder to use.

  • It is practically impossible to follow
  • ne naming convention even if you try.
  • Easier to make errors.
  • It is harder to guess names of

functions and parameters.

slide-47
SLIDE 47

as.date("2013-07-11")

slide-48
SLIDE 48

as.date("2013-07-11") asDate("2013-07-11")

slide-49
SLIDE 49

as.date("2013-07-11") asDate("2013-07-11") as_date("2013-07-11")

slide-50
SLIDE 50

as.date("2013-07-11") asDate("2013-07-11") as_date("2013-07-11") as.Date("2013-07-11")✔

slide-51
SLIDE 51

Heterogenous naming conventions makes R harder to use.

  • It is practically impossible to follow
  • ne naming convention even if you try.
  • Easier to make errors.
  • It is harder to guess names of

functions and parameters.

  • It invites functions with names that

just differ by convention. anova vs Anova ncol vs NCOL summary vs Summary

slide-52
SLIDE 52
slide-53
SLIDE 53

So, what to do?

  • Use autocompletion.
slide-54
SLIDE 54

Autocompletion is great!

  • Relieves the cognitive burden of having

to remember identifier names exactly.

slide-55
SLIDE 55

Autocompletion is great!

  • Relieves the cognitive burden of having

to remember identifier names exactly.

  • Works great in many editors, for

example Rstudio.

slide-56
SLIDE 56

Autocompletion is great!

  • Relieves the cognitive burden of having

to remember identifier names exactly.

  • Works great in many editors, for

example Rstudio.

  • Does not work if you don't know the

case of the first letter.

slide-57
SLIDE 57
slide-58
SLIDE 58
slide-59
SLIDE 59

Autocompletion is great!

  • Relieves the cognitive burden of having

to remember identifier names exactly.

  • Works great in many editors, for

example Rstudio.

  • Does not work if you don't know the

case of the first letter.

  • Does not work if you use the wrong

naming convention.

slide-60
SLIDE 60
slide-61
SLIDE 61
slide-62
SLIDE 62

So, what to do?

  • Use autocompletion.
  • At least follow some naming convention.
slide-63
SLIDE 63
slide-64
SLIDE 64
slide-65
SLIDE 65

So, what to do?

  • Use autocompletion.
  • At least follow some naming convention.
  • But don't follow Google's!
slide-66
SLIDE 66
slide-67
SLIDE 67
slide-68
SLIDE 68

So, what to do?

  • Use autocompletion.
  • At least follow some naming convention.
  • But don't follow Google's!
  • Be consistent.
slide-69
SLIDE 69

So, what to do?

  • Use autocompletion.
  • At least follow some naming convention.
  • But don't follow Google's!
  • Be consistent.

Like hadley_wickham.

slide-70
SLIDE 70

Conclusions

  • Naming conventions are important.
slide-71
SLIDE 71

Conclusions

  • Naming conventions are important.
  • You're not totally off if you use

lowerCamelCase for function names.

slide-72
SLIDE 72

Conclusions

  • Naming conventions are important.
  • You're not totally off if you use

lowerCamelCase for function names.

  • You're totally off if you use

UpperCamelCase.

slide-73
SLIDE 73

Conclusions

  • Naming conventions are important.
  • You're not totally off if you use

lowerCamelCase for function names.

  • You're totally off if you use

UpperCamelCase.

  • CRAN is great for statistics about R

usage.

slide-74
SLIDE 74

From the UseR 2013 Website: " Following eight successful useR! meetings, the conference is focused on:

  • 1. R as the `lingua franca' of data

analysis and statistical computing, ..."

slide-75
SLIDE 75

Thanks for listening!

slide-76
SLIDE 76