CIS 218 Advanced UNIX (g)awk CIS 218 Advanced UNIX 1 Overview awk - - PowerPoint PPT Presentation

cis 218 advanced unix
SMART_READER_LITE
LIVE PREVIEW

CIS 218 Advanced UNIX (g)awk CIS 218 Advanced UNIX 1 Overview awk - - PowerPoint PPT Presentation

CIS 218 Advanced UNIX (g)awk CIS 218 Advanced UNIX 1 Overview awk is a programming language Awk uses syntax based on grep and sed for handling numbers and text awk provides field level addressability. And within a field (word)


slide-1
SLIDE 1

CIS 218 Advanced UNIX 1

CIS 218 – Advanced UNIX

(g)awk

slide-2
SLIDE 2

CIS 218 Advanced UNIX 2

Overview

  • awk is a programming language
  • Awk uses syntax based on grep and sed for

handling numbers and text

  • awk provides field level addressability.

And within a field (word) using substring commands

  • awk works field by field
slide-3
SLIDE 3

CIS 218 Advanced UNIX 3

awk command syntax

  • There are two ways to execute an awk

program/script:

– awk [-F field-separator] ‘program’ target-file – awk [-F field-separator] -f program.file target

  • From our discussion of sed, and

Refrigerator Rule No. 5, I would hope you are firmly committed to the second form!

slide-4
SLIDE 4

CIS 218 Advanced UNIX 4

awk Variables

  • There are a number of awk variables that

are very useful

– FS (The field separator, defaults to white space) – OFS (Output field separator, can be critical) – NR (Number of records, a sequential counter) – NF (Number of fields in the current record) – FILENAME (Name of the current target file)

slide-5
SLIDE 5

CIS 218 Advanced UNIX 5

awk Variables (cont.)

– $0 (The entire line as read from the target file) – $n (Where n is the nth field in the record. This is how we get field level addressability in awk)

  • nawk, gawk, etc give us more variables, the

most significant two are:

– ARGC (the count of the command line arguments) – ARGV (an array of the command line arguments)

slide-6
SLIDE 6

CIS 218 Advanced UNIX 6

Parts of a program

  • All programs are composed of one or more
  • f the following three constructs:

– sequence (a series of instructions, one following the next, executed sequentially) – selection (the ability of the code to decide which instructions to execute, conditional execution) – iteration (adding looping so that selected code will be repeated over an over)

slide-7
SLIDE 7

CIS 218 Advanced UNIX 7

awk Program Format

  • Awk programs are composed of

pattern {action} pairs (actions must be enclosed in French braces {} )

– a pattern without a corresponding action takes the default action, print $0 – an action without a corresponding pattern is applied to every line – each input line is submitted to every pattern/action pair

slide-8
SLIDE 8

CIS 218 Advanced UNIX 8

awk Program Format (cont.)

  • Placement of the open French brace is critical

– pattern { both patterns are action 1 executed for lines action 2 matching the pattern } – pattern

lines matching the pattern

{action 1

are printed, and both

action 2

actions are performed on

}

every line!

slide-9
SLIDE 9

CIS 218 Advanced UNIX 9

Patterns

  • In an awk program, the pattern is the

selection tool that decides what actions are applied to which lines.

  • Patterns can be:

– relational expressions – regular expressions – magic patterns

slide-10
SLIDE 10

CIS 218 Advanced UNIX 10

Relational Expression patterns

Symbol Meaning Symbol Meaning

< Less than == equal to <= Less than or equal to ~ contains the RE > Greater than !~ doesn't contain RE >= Greater than or equal to && logical and != not equal to || logical or

slide-11
SLIDE 11

CIS 218 Advanced UNIX 11

Regular Expression patterns

  • Must be enclosed in slashes /RE/
  • Anchors apply to the entire line if they are

used as the only pattern

  • Remember, you can use regular expressions

in relational patterns with ~ and !~ to apply them to fields

  • Both true regular expressions and fixed

patterns can be used as REs in awk

slide-12
SLIDE 12

CIS 218 Advanced UNIX 12

Pre/Post Processing

  • There are two in awk:

– BEGIN {the action associated is performed before the target file is opened} – END {the action associated is performed after the target file is successfully closed}

  • Both are coded in UPPER CASE
slide-13
SLIDE 13

CIS 218 Advanced UNIX 13

# comments

  • Like most scripting languages # indicates a

comment

  • awk scripts should be well documented
  • Comments should explain what you are

doing and why.

slide-14
SLIDE 14

CIS 218 Advanced UNIX 14

print

  • The print command is the simplistic output

tool for awk. Basically and “echo”/

  • You can direct print to send its data to a file

with the > operator

  • Generally print is used for simple output or

debugging output

slide-15
SLIDE 15

CIS 218 Advanced UNIX 15

printf

  • Similar in concept to the “C” language command.

The format of a printf command is: printf (“formatting string”,variables)

  • The formatting characters correspond to the

variables one for one in both lists.

  • Each formatting character is prefixed by %
slide-16
SLIDE 16

CIS 218 Advanced UNIX 16

printf (cont.)

  • The formatting specifiers contain then

following characters:

– - indicates that the data should be left justifed – n indicates the minimum width of the field – .n indicates the maximum width of the field “%-5s” indicates a string field, left justified, of width 5 bytes

slide-17
SLIDE 17

CIS 218 Advanced UNIX 17

printf formatting characters

Format Meaning Format Meaning %c single ASCII character %G shortest of %E or %f %d decimal integer %i decimal integer %e scientific notation %o

  • ctal number

%E SCIENTIFIC NOTATION %s string %f floating point %x hexadecimal (lc) %g shortest of %f or %e %X HEXADECIMAL

slide-18
SLIDE 18

CIS 218 Advanced UNIX 18

printf spacing characters

  • There are two characters available to

change the spacing of your text:

– \n inserts a newline character. You must use this if you want your output to occur on successive lines. – \t inserts a tab character

slide-19
SLIDE 19

CIS 218 Advanced UNIX 19

getline

  • getline is used to read from the keyboard
  • It can also capture the results of a command

but this form is seldom used

  • Read from the keyboard using

getline variable < “/dev/tty”

  • If you don’t supply a variable, awk will use

$0, so in most cases you want to use a variable.

slide-20
SLIDE 20

CIS 218 Advanced UNIX 20

rand() srand()

  • The rand() function generates pseudo-

random numbers in the range 0 - 1.

  • Given the same seed, it will always generate

the same series of numbers.

  • srand() is used to supply a new seed to

rand().

  • If you don’t supply srand() a value, it uses

the current time as the seed.

slide-21
SLIDE 21

CIS 218 Advanced UNIX 21

system()

  • The system() function allows you to execute

system commands within an awk script.

  • You must enclose the system command in

quotation marks.

  • You cannot capture the output from the

system() function within the script but you can capture the return code.

slide-22
SLIDE 22

CIS 218 Advanced UNIX 22

length()

  • The length([argument]) function returns the

length of the argument in bytes.

  • If you give length() a number, it will return

the number of digits in the number.

  • If you don’t give length() an argument, it

will use $0 by default.

slide-23
SLIDE 23

CIS 218 Advanced UNIX 23

index()

  • The index(string,target) function returns the

position of the first occurrence of the target within the string.

  • The index() function is often used to set the

boundary for the substr() function.

slide-24
SLIDE 24

CIS 218 Advanced UNIX 24

substr()

  • The substr(string,start[,length]) function

will return the part of the string beginning with start and continuing for length bytes.

  • If you don’t give it a length, it will return all

the bytes between the start and the end of the string.

slide-25
SLIDE 25

CIS 218 Advanced UNIX 25

split()

  • You will use split(string, array[, separator])

to divide a string into parts using separator to parse them, storing the resultant parts in the array.

  • If you don’t code a separator, the function

will use the field separator to parse the string.

slide-26
SLIDE 26

CIS 218 Advanced UNIX 26

if

  • Besides using patterns, if gives us another

way to perform selection

  • The format of an if statement is

if (condition) {verb(s)} [else { verb(s)}]

  • If you have more than one verb, they must

be enclosed in French braces.

slide-27
SLIDE 27

CIS 218 Advanced UNIX 27

if conditions

A < B A is less than B A <= B A is less than or equal to B A == B A equals B (note 2 =) A > B A is greater than B A >= B A is greater than or equal to B A != B A is not equal to B A ~ /RE/ A contains the regular expression RE

slide-28
SLIDE 28

CIS 218 Advanced UNIX 28

if

  • A sample if
slide-29
SLIDE 29

CIS 218 Advanced UNIX 29

exit

  • The input file is closed
  • Control is transferred to the action

associated with the END magic pattern if there is one

  • Generally used as a bailout in case of

catastrophic errors

slide-30
SLIDE 30

CIS 218 Advanced UNIX 30

for loop

  • This is a counted loop
  • executes until the counter reaches the target

value

  • Increment (count up) or decrement (count

down)

  • also works with the elements of an array
  • multiple verbs must be enclosed in { }
slide-31
SLIDE 31

CIS 218 Advanced UNIX 31

for loop example

slide-32
SLIDE 32

CIS 218 Advanced UNIX 32

while loop

  • The while loop is an example of conditional

execution

  • The loop cycles as long as the condition

specified is true

  • A while loop always checks to see if it

should execute

  • multiple verbs must be enclosed in { }
slide-33
SLIDE 33

CIS 218 Advanced UNIX 33

while loop example

slide-34
SLIDE 34

CIS 218 Advanced UNIX 34

do/while

  • Even though it has a while in it, this is an

example of until logic.

  • Until logic is shunned by conscientious

coders.

  • ‘nuff said
slide-35
SLIDE 35

CIS 218 Advanced UNIX 35

break

  • Used to exit from a loop
  • Control is passed to the line following the

end of the loop

  • Causes an exit from the loop but NOT the

awk script. If you want to bail out of the whole script, use the exit command.

slide-36
SLIDE 36

CIS 218 Advanced UNIX 36

break example

slide-37
SLIDE 37

CIS 218 Advanced UNIX 37

continue

  • Causes awk to skip the rest of the body of

the loop for the current value

  • In a for loop the counter is incremented, and

the next cycle of the loop is started

  • In a while loop, the next iteration of the

loop starts

slide-38
SLIDE 38

CIS 218 Advanced UNIX 38

continue example

slide-39
SLIDE 39

CIS 218 Advanced UNIX 39

next

  • Causes the script to start over
  • takes the next element from standard input
  • r the target file
  • Like exit, this command effects the whole

script

slide-40
SLIDE 40

CIS 218 Advanced UNIX 40

next example