An Empirical Study commands on Bad Practices in Bash Scripts It - - PowerPoint PPT Presentation

an empirical study
SMART_READER_LITE
LIVE PREVIEW

An Empirical Study commands on Bad Practices in Bash Scripts It - - PowerPoint PPT Presentation

Introduction Shell language is a powerful language for orchestrating shell An Empirical Study commands on Bad Practices in Bash Scripts It has been widely used for decades System administration Various shell command orchestration


slide-1
SLIDE 1

An Empirical Study

  • n Bad Practices in Bash Scripts

CS846 Charles Li, Yiwen Dong

Introduction

  • Shell language is a powerful language for orchestrating shell

commands

  • It has been widely used for decades
  • System administration
  • Various shell command orchestration
  • However, its syntax is not straightforward compared to modern languages
  • This study focuses on Bash and its common bad practices using

ShellCheck and IntelliJ Shell Parser

RQ1: How common is Bash scripts?

  • GitHub
  • 320k public Shell repositories
  • Octoverse(An annual survey by Github)
  • Shell language has been among the top 10 programming languages in the last 5 years

based on the amount of unique contributors in public and private repositories

  • Top 1000 Shell repositories
  • 22240 bash files
  • 4927 sh files
  • 264 z-shell files
  • 5 c-shell files
  • 2 k-shell files

RQ2: What are the distribution of Shell language constructs?

  • SIMPLE_COMMAND
  • echo "Hello, world!"
  • grep mysql/etc/passwd
  • These constructs are

Abstract Syntax Tree(AST) nodes defined in Intellij Parser

  • Not each node is useful

as some can be recursive expansion rules

slide-2
SLIDE 2

RQ3: How frequent do bad practices occur in the Bash files?

  • 8279/27167(30.5%) of files

exhibit no bad practices. Some files have extremely high number of errors <2000

  • Mean of 9.43
  • Median of 3

Manual inspection

  • Manual inspection of files with <2000 errors revel large files with

many of the same errors.

  • Error SC2086 SC2162 SC2140 covers 7048/7091(99.39%) of all errors in these

files

Name Level Count Message SC2086 Info 5488 Double quote to prevent globbing and word splitting. SC2162 Info 804 read without -r will mangle backslashes SC2140 Warning 756 Word is on the form "A"B"C" (B indicated). Did you mean "ABC" or "A\"B\"C"?

RQ4: What is the distribution of bad practices in the Bash files? What are the common bad practices?

  • Top 10 most seen errors are …

Rank Level Suggestion Count Message Group 1 info SC2086 130567 Double quote to prevent globbing and word splitting. Quote 2 style SC2006 13694 Use $(...) notation instead of legacy backticked `...`. Syntax 3 warning SC2034 12575 foo appears unused. Verify it or export it. Variable 4 info SC1091 8546 Not following: (error message here) Parsing 5 warning SC2154 7935 var is referenced but not assigned. Variable 6 warning SC2046 4919 Quote this to prevent word splitting Quote 7 style SC2002 4522 Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead. IO 8 error SC1041 4155 Found 'eof' further down, but not on a separate line. Parsing 9 error SC1072 4131 Unexpected .. Parsing 10 info SC2016 3362 Expressions don't expand in single quotes, use double quotes for that. Quote 11 info SC2162 3312 read without -r will mangle backslashes IO 12 error SC1073 3271 Couldn't parse this (thing). Fix to allow more checks. Parsing 13 warning SC2027 3214 The surrounding quotes actually unquote this. Remove or escape them. Quote 14 style SC2004 3161 $/${} is unnecessary on arithmetic variables. Variable 15 warning SC2164 3064 Use cd ... || exit in case cd fails. Error Handling

Occurrence of group in top 15 most common Suggestion

Group Count Percentage Parsing 20103 9.55% Companion 0.00% Syntax 13694 6.51% Variable 23671 11.25% Error Handling 3064 1.46% Quote 142062 67.51% IO 7834 3.72% Logic 0.00% sum: 210428

slide-3
SLIDE 3

RQ5: What are the more error-prone shell language constructs?

  • We associate each Abstract Syntax Tree(AST) node to errors identified

by ShellCheck

  • Error location from ShellCheck output is quite limited and some errors might

be incorrectly associated to some parent nodes

  • Results collected have 4 types of output
  • Error
  • Warning
  • Info
  • Style

Error

  • FOR_CLAUSE
  • for URL in ${URLS[@]};

do...

  • for URL in

"${URLS[@]}"; do...

  • Double quote array

expansion to avoid re- splitting elements

Warning

  • ASSIGNMENT_COMMAND
  • export foo="$(mycmd)"

Return value of mycmd is ignored Better to have export on a separate line

  • Some other errors are due

to unused variables

ShellCheck is quite limited in finding references in external files

Info

  • VARIABLE
  • rmdir $STAGING
  • rmdir "$STAGING"
  • Double quotes to prevent

globbing and word splitting

  • Case-by-case
slide-4
SLIDE 4

Style

  • COMMAND_SUBSTITUION_

COMMAND

  • echo `uname`
  • Use $(...) notation instead
  • f legacy backticked `...`
  • Legacy syntax
  • Backtick is hard to nest

Threats to Validity

  • Auto-generated scripts could skew the data
  • ShellCheck is not perfect
  • The data was prepared on a windows machine with CRLF and some

ShellCheck errors are sensitive to this

  • There are exceptions in ShellCheck errors
  • False positive
  • The boundary between Error/Warning/Info/Style are not clear-cut

Conclusion

  • Shell language is popular with Bash being the mainstream
  • Quoting, variable handling, and syntax of Bash language can use the

most help from researchers and developers to make better