Working with files CISC3130, Spring 2013 X. Zhang 1 Outlines - - PowerPoint PPT Presentation

working with files
SMART_READER_LITE
LIVE PREVIEW

Working with files CISC3130, Spring 2013 X. Zhang 1 Outlines - - PowerPoint PPT Presentation

Working with files CISC3130, Spring 2013 X. Zhang 1 Outlines Finish up with awk: pipeline, external commands Commands working with files tree, ls (-d option, -1 option, -R, -a) od (octal dump), stat (show meta data of file)


slide-1
SLIDE 1

1

CISC3130, Spring 2013

  • X. Zhang

Working with files

slide-2
SLIDE 2

2

Outlines

 Finish up with awk: pipeline, external commands  Commands working with files

 tree, ls (-d option, -1 option, -R, -a)  od (octal dump), stat (show meta data of file)  touch command, temporary file, file with random bytes  File checksum, verification  locate, type, which, find command: Finding files

slide-3
SLIDE 3

3

Some useful tips

 Bash stores the commands history

 Use UP/DOWN arrow to browse them  Use “history” to show past commands

 Repeat a previous command

 !<command_no>

 e.g., !239

 “!<any prefix of previous command>

 E.g., !g++

 Search for a command

 Type Ctrl-r, and then a string  Bash will search previous commands for a match

 File name autocompletion: “tab” key

slide-4
SLIDE 4

Output redirection: to pipeline

#!/bin/awk -f BEGIN { FS = ":“ ## generate a temporay file "mktemp /tmp/prog.XXXXXXXX" | getline tmpfile print "temp file is: ", tmpfile close ("mktemp") } { # select username for users using bash if ($7 ~ "/bin/bash") print $1 >> tmpfile }

4

END{ while ((getline < tmpfile) > 0) { cmd="mail -s Fellow_BASH_USER " $0 print "Hello," $0 | cmd ## send an email to every bash user } close (tmpfile); }

pipe_mail.awk Todo: 1. 2.

slide-5
SLIDE 5

Execute external command

 Using system function (similar to C/C++)

 E.g., system (“rm –f tmp”) to remove a file

if (system(“rm –f tmp”)!=0) print “failed to rm tmp”

 A shell is started to run the command line passed as

argument

 Inherit awk program’s standard input/output/error

5

slide-6
SLIDE 6

6

Outlines

 Finish up with awk: pipeline, external commands  Commands working with files

 tree, ls (-d option, -1 option, -R, -a)  od (octal dump), stat (show meta data of file), cmp, diff  touch command  temporary file, file with random bytes  locate, type, which, find command: Finding files

slide-7
SLIDE 7

7

What’s in a file ?

 files are organized in a hierarchical directory structure

 Each file has a name, resides under a directory, is associated with

some meta info (permission, owner, timestamps)

 Disk files, virtual file system, device files

 Contents of disk file: text (ASCII) file (such as your C/C++ source

code), executable file (commands), a link to other files, …

 ln -s /path/to/file1.txt /path/to/file2.txt

 /proc filesystem stores system configuration parameters, resides in

kernels memory

 Numerical subdirectories exist for every process.

 a device file or special file is an interface for a device driver that

appears in a file system as if it were an ordinary file

 For example, /dev/stdin, /dev/tty*

slide-8
SLIDE 8

8

What’s in a file ?

 Recall, ls –l output, first character indicates file types:

 d directory, - plain file, b block-type special file, c character-type

special file, l symbolic link, s socket

 To check type of file: “file filename”

 To view “octal dump” of a file:

 od [OPTION]... [FILE]...

  • d --traditional [FILE] [[+]OFFSET [[+]LABEL]]

 Important options:

 -A: what base to use when displaying address (default: base 8)  -t: specify how to interpret file content

 a: named character, c: ASCII character or backslash representation  d[size]: signed decimal, size bytes per integer  o[size], octal ; x[size], hexadecimal

slide-9
SLIDE 9

9

What’s in a file ?

 Example of od

$echo abc def ghi jkl | od -c 0000000 a b c d e f g h i j k l \n 0000020 [zhang@storm ~]$ echo abc def ghi jkl | od -Ad –c ## same as –t c 0000000 a b c d e f g h i j k l \n 0000016 $ echo abc def ghi jkl | od -Ad -t d1 ## interpret each byte as decimal integer 0000000 97 98 99 32 100 101 102 32 103 104 105 32 106 107 108 10 0000016 $echo abc def ghi jkl | od -Ad -t x1 0000000 61 62 63 20 64 65 66 20 67 68 69 20 6a 6b 6c 0a 0000016

slide-10
SLIDE 10

Disk space usage

 df report file system disk space usage

df [OPTION]... [FILE]...

 Show information about file system on which each FILE resides,

  • r all file systems by default.

 du - estimate file space usage

du [OPTION]... [FILE]...

 Summarize disk usage of each FILE, recursively for directories.

 quota - display disk usage and limits

10

slide-11
SLIDE 11

11

Compare file contents

 Compare files

 cmp file1 file2: finds the first place where two files differ (in

terms of line and character)

 diff file1 file2: reports all lines that are different

 diff’s output is carefully designed so that it can be used by other

  • programs. For example, revision control systems use diff to manage the

differences between successive versions of files under their management.

 patch command: apply a diff file to an original

patch [options] [originalfile [patchfile]] patch -pnum <patchfile

slide-12
SLIDE 12

File checksum

 provide a single number, signature, that is characteristic of

the file (computed from all of the bytes of the file)

 Files with different contents is unlikely to have same checksum  Usage: Software announcements include checksums of

distribution files for user to tell whether a copy matches

  • riginal.

12

slide-13
SLIDE 13
  • penssl

 a cryptography toolkit implementing Secure Sockets Layer and

Transport Layer Security network protocols and related cryptography standards

 openssl program: a command line tool for using various

cryptography functions from shell.

 Creation and management of private keys, public keys and parameters  Public key cryptographic operations  Creation of X.509 certificates, CSRs and CRLs  Calculation of Message Digests  Encryption and Decryption with Ciphers  SSL/TLS Client and Server Tests  Handling of S/MIME signed or encrypted mail  Time Stamp requests, generation and verification

13

slide-14
SLIDE 14

Message digest

  • penssl dgst [-md5|-md4|-md2|-sha1|-sha|-mdc2|-

ripemd160|-dss1] [-c] [-d] [-hex] [-binary] [-out filename] [- sign filename] [-keyform arg] [-passin arg] [-verify filename] [-prverify filename] [-signature filename] [-hmac key] [file...] Or [md5|md4|md2|sha1|sha|mdc2|ripemd160] [-c] [-d] [file...]

 Output message digest of a supplied file or files in

hexadecimal form

14

slide-15
SLIDE 15

Example

$ md5sum /bin/l? 696a4fa5a98b81b066422a39204ffea4 /bin/ln cd6761364e3350d010c834ce11464779 /bin/lp 351f5eab0baa6eddae391f84d0a6c192 /bin/ls

 Output: 32 hexadecimal digits, i.e., 128 bits.  chance of two different files with identical signatures is:

1/2128 (the book: 1/264)

 In 2005, researchers were able to create pairs

  • f PostScript documents and X.509 certificates with the same hash. Later that

year, MD5's designer Ron Rivest wrote, "md5 and sha1 are both clearly broken (in terms of collision-resistance)."

15

slide-16
SLIDE 16

public-key cryptography

 Data security by two related keys: a private key, known only to its

  • wner, and a public key, potentially known to anyone

 Examples: RSA, DSA algorithms

 Digital signature: Alice => Bob communication

 If Alice wants to sign an open letter, she uses her private key to encrypt it. Bob

uses Alice’s public key to decrypt signed letter, and can then be confident that

  • nly Alice could have signed it, provided that she is trusted not to divulge

her private key.  Secrecy:

 If Alice wants to send a letter to Bob that only he can read, she encrypts it

with Bob’s public key, and he then uses his private key to decrypt it. As long as Bob keeps his private key secret, Alice can be confident that only Bob can read her letter.

16

slide-17
SLIDE 17

Secure Software Distribution

 many software archives include digital signatures that

incorporate information from a file checksum as well as from signer’s private key.

 how to verify such signatures ?

$ ls -l coreutils-5.0.tar* ##Show the distribution files

  • rw-rw-r-- 1 jones devel 6020616 Apr 2 2003 coreutils-5.0.tar.gz
  • rw-rw-r-- 1 jones devel 65 Apr 2 2003 coreutils-5.0.tar.gz.sig

$ gpg coreutils-5.0.tar.gz.sig ##Try to verify the signature gpg: Signature made Wed Apr 2 14:26:58 2003 MST using DSA key ID D333CBA1 gpg: Can't check signature: public key not found

17

slide-18
SLIDE 18

Verify using public key

 Obtain public key from public servers  Add the public key to your key ring

$ gpg --import temp.key gpg: key D333CBA1: public key "Jim Meyering <jim@meyering.net>" imported gpg: Total number processed: 1 gpg: imported: 1

 Verify the signature successfully:

$ gpg coreutils-5.0.tar.gz.sig Verify the digital signature

 Online resource: The GNU Privacy Handbook

18

slide-19
SLIDE 19

19

Outlines

 Finish up with awk: pipeline, external commands  Commands working with files

 tree, ls and echo (-d option, -1 option, -R, -a)  od (octal dump), stat (show meta data of file), cmp, diff  touch command, mktemp, file with random bytes  File checksum, verification  locate, type, which, find command: Finding files

 Process-related commands

slide-20
SLIDE 20

touch: update modification time

 Touch sometimes used to create empty files: their existence and

possibly their timestamps, but not their contents, are significant.

 a lock file to indicate that a program is already running, and that a second

instance should not be started.

 to record a file timestamp for later comparison with other files.

 Example: $touch -t 197607040000.00 US-bicentennial $ ls -l US-bicentennial ##List the file

  • rw-rw-r-- 1 jones devel 0 Jul 4 1976 US-bicentennial

$ touch -r US-bicentennial birthday #Copy timestamp to the new birthday file $ ls -l birthday ## List the new file

  • rw-rw-r-- 1 jones devel 0 Jul 4 1976 birthday

20

slide-21
SLIDE 21

Temporary files

 So far, we created in current directory

 And remove it after using it  What if multiple scripts use same file name? or malicious users

modify the files?

 Special directories, /tmp (cleared when system reboots) and

/var/tmp

 To avoid filename collision, append process id as suffix

## create a temporary file in shell scripts tmpfile=temp.$$ ## $$ (process id) echo $tmpfile

21

slide-22
SLIDE 22

mktemp command

 mktemp: takes an optional filename template containing a string

  • f trailing X characters, preferably at least a dozen of them.

 mktemp replaces them with an alphanumeric string derived from

random numbers and process ID, creates the file with no access for group and other, and prints filename on standard output.

$ TMPFILE=`mktemp /tmp/myprog.XXXXXXXXXXXX` || exit 1 Make unique temporary file $ ls -l $TMPFILE List the temporary file

  • rw------- 1 jones devel 0 Mar 17 07:30 /tmp/myprog.hJmNZbq25727

22

slide-23
SLIDE 23

Random bytes

 two random pseudodevices: /dev/random and /dev/urandom.  These devices serve as never-empty streams of random bytes:

such a data source is needed in many cryptographic and security applications.

23

slide-24
SLIDE 24

24

Outlines

 Finish up with awk: pipeline, external commands  Commands working with files

 tree, ls and echo (-d option, -1 option, -R, -a)  od (octal dump), stat (show meta data of file), cmp, diff  File checksum, verification  touch command  temporary file, file with random bytes  locate, type, which, find command: Finding files

slide-25
SLIDE 25

Search for files

 locate: find files by name, using regularly updated database

constructed by complete scans of the filesystem

 locate [OPTION]... PATTERN...

$locate cksum

 which: display full pathname for a command, using PATH

variable

$which rm alias rm='rm' /bin/rm

 type: shell built-in command, how each name would be

interpreted if used as a command name

 -t option: report if a name is an alias, shell reserved word, function, builtin,

  • r disk file

25

slide-26
SLIDE 26

find command

 find [ files-or-directories ] [ options ]: find files matching

specified name patterns, or having given attributes.

–atime n: Select files with access times of n days (-ctime, -mtime) –ls: Produce a listing similar to the ls long form, rather than just filenames. –name 'pattern’ : select files matching the shell wildcard pattern (quoted to protect it from shell interpretation). –perm mask: select files matching the specified octal permission mask. –prune: do not descend recursively into directory trees. –size n: select files of size n. –type t: select files of type t,a single letter: d (directory), f (file),or l (symbolic link).

26

slide-27
SLIDE 27

find: basic operations

find [ files-or-directories ] [ options ]:

 When it finds a file, it first carries out selection restrictions

implied by options, and if those tests succeed, it hands the name off to internal action routine.

 default action: print name on standard output,  –exec option: provides a command template into which name is

substituted, and the command is then executed.

27

files and directories to search (directories are (almost) always descended into recursively) Options: select names for ultimate display or action

slide-28
SLIDE 28

find usage examples

 find: display all files/directory under current directory  find -ls: display files/directories in “ls” style  find * -prune  find $HOME/. ! -user $USER.  find -ls -type f -fprint /tmp/mytemp

$find -ls -type f -fprint /tmp/mytemp 23724924 4 drwxr-xr-x 2 zhang staff 4096 Mar 25 22:40 . 23724925 0 --wx------ 1 zhang staff 0 Mar 25 22:35 ./a 23724927 0 -rw-r--r-- 1 zhang staff 0 Mar 25 22:35 ./b 23724928 4 -rw-r--r-- 1 zhang staff 10 Mar 25 22:40 ./tmp [zhang@storm testfind]$ more /tmp/mytemp ./a ./b ./tmp

28

slide-29
SLIDE 29

find: examples

 Files that haven’t been modified in the last year

find . -mtime +365

 Unsigned integer: exactly that many days old  Negative: less than that absolute value  Positive: more than that value

 Files that user has writing permission find . –perm -200 ## all bits set needs to match  permission mask as an octal string

 Unsigned: an exact match on the permissions is required.  Negative: all of the bits set are required to match.  Positive: at least one of the bits set must match,

 E.g., +700 //user can read, or write, or execute …

 Files that user does not have reading permission

 find . ! –perm -400

29

slide-30
SLIDE 30

Find: selector

 selector options can be combined: all must match for the

action to be taken.

 interspersed with the –a (AND) option  –o (OR) option: at least one selector of the surrounding pair

must match.

 Find nonempty files smaller than 10 blocks (5120 bytes)

$ find . -size +0 -a -size -10

 Find files that are empty or unread in the past year

$ find . -size 0 -o -atime +365

30

slide-31
SLIDE 31

Usage of find in shell script

#!/bin/bash … ## go to top level web site directory find . -name '*.html' -type f | ##Find all HTML files while read file ## Read filename into variable do echo $file ## Print progress mv $file $file.save ## Save a backup copy ##Make the change sed -f $HOME/html2xhtml.sed < $file.save > $file done

31

slide-32
SLIDE 32

html2xhtml.sed

 converts HTML to XHTML: converts tags to lowercase, and

changes <br> tag into self-closing form, <br/>:

s/<H1>/<h1>/g Slash delimiter s/<H2>/<h2>/g s/<H3>/<h3>/g s/<H4>/<h4>/g s/<H5>/<h5>/g s/<H6>/<h6>/g s:</H1>:</h1>:g Colon delimiter, slash in data s:</H2>:</h2>:g .. s:</[Hh][Tt][Mm][LL]>:</html>:g s:</[Hh][Tt][Mm][Ll]>:</html>:g s:<[Bb][Rr]>:<br/>:g

32

HTML to XHTML, standardized XML-based version of HTML

slide-33
SLIDE 33

Total file size

 $ find -ls | awk '{Sum += $7} END {printf("Total: %.0f

bytes\n", Sum)}'

 Total: 23079017 bytes

33

slide-34
SLIDE 34

xargs command

 Supply the list returned by find as arguments to another

command

 Via shell’s command substitution feature. E.g., searching for

symbol POSIX_OPEN_MAX in system header files:

$ grep POSIX_OPEN_MAX /dev/null $(find /usr/include -type f | sort) /usr/include/limits.h: #define _POSIX_OPEN_MAX 16

 Note: why /dev/null here?  Potential problems: command line might exceed system limit =>

argument list too long error $getconf ARG_MAX ##sysget configuration values 2097152

34

slide-35
SLIDE 35

Xargs command

 xargs: takes a list of arguments from standard input, one per line, and

feeds them in suitably sized groups (determined by ARG_MAX) to another command given as arguments to xargs.

$ find /usr/include -type f | xargs grep POSIX_OPEN_MAX /dev/null /usr/include/bits/posix1_lim.h:#define _POSIX_OPEN_MAX 16 /usr/include/bits/posix1_lim.h:#define _POSIX_FD_SETSIZE _POSIX_OPEN_MAX

35

slide-36
SLIDE 36

Code Studies: filesdirectories

36

slide-37
SLIDE 37

37

Summary

 Finish up with awk: pipeline, external commands  Commands working with files

 tree, ls (-d option, -1 option, -R, -a)  od (octal dump), stat (show meta data of file)  touch command, temporary file, file with random bytes  File checksum, verification  locate, type, which, find command: Finding files