COMP 2103 Programming 3 Part 4 Jim Diamond CAR 409 Jodrey School - - PowerPoint PPT Presentation

comp 2103 programming 3 part 4
SMART_READER_LITE
LIVE PREVIEW

COMP 2103 Programming 3 Part 4 Jim Diamond CAR 409 Jodrey School - - PowerPoint PPT Presentation

COMP 2103 Programming 3 Part 4 Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University 171 Modules: Introduction Modules: a technique for organizing your functions Idea 1: if you have a large program, you


slide-1
SLIDE 1

COMP 2103 — Programming 3 — Part 4

Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University

slide-2
SLIDE 2

171

Modules: Introduction

  • Modules: a technique for organizing your functions

  • Idea 1: if you have a large program, you don’t want all your functions in
  • ne file

imagine 1,000,000 lines of code in one file

– changing one line means you would have to re-compile everything – editing the file might become unwieldy – – re-using code is difficult

  • Idea 2: suppose you develop your own library of (related) functions,

– – you might want people to be able to use your library functions without giving them your source code

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-3
SLIDE 3

172

Modules: Organization

  • A module consists of two files:

– a “.c” file, which contains the implementation of the module functions, and – a “.h” file, which contains declarations of the “public” module functions, and other things (see next slide) –

  • The programmer creating the module prepares these two files

(e.g., some_package.c and some_package.h)

  • The programmer using the module uses

#include "some_package.h"

in any of his source files which need declarations from that module

  • The programmer using the module includes the implementation file† in

his/her gcc line:

$ gcc -Wall ... myprog.c some_package.c -o myprog

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-4
SLIDE 4

173

Modules: Content of The “✳❤” File

  • The .h file contains all interface information which should be exposed

to the module user – the documentation for the user – any #define constants or macros needed by the user – any data types created with typedef or struct needed by the user – any function declarations that the user should see – there may be functions in the implementation of the module which are not exposed to the user (“private” functions)

  • Example: math.h

# define M_PI 3.14159265358979323846 /* pi */

– has macros; e.g.,

# define isless(x, y) __builtin_isless(x, y)

– provides function prototypes (for sin(), cos(), . . . )

  • Some .h files also define typedefs and/or structs

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-5
SLIDE 5

174

Modules: Structure of The “✳❤” File

  • Suppose your module is named myfuncs

– the myfuncs.h file MUST (by convention, but still MUST) look like this

#ifndef MYFUNCS_H #define MYFUNCS_H ... all the macros, function prototypes, ... #endif

  • This #ifndef ... #define ... #endif construct tells the compiler to

skip the contents of the file on second (and third, fourth, . . . ) readings

  • Why do we care?

– because in some (simple or complex) situations you might end up

#includeing a .h file twice;

  • So what?

– the compiler doesn’t allow you to do some things twice (like

#defineing the same token twice)

– and just because you have a fast computer doesn’t mean you should waste CPU time

C++ (bah!)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-6
SLIDE 6

175

Modules: The “✳❝” File

  • First, the .c file #includes the corresponding .h file

– this ensures that the declarations (of externally-visible functions) are in agreement with the function definitions – (if they disagree the compiler will bleed over you when you try to compile the .c file!) – this also provides the .c file with any constant or macro definitions found in the .h file

  • Following the #include is the rest of the implementation of the module

functions (including the documentation for the implementation)

  • For example (docs are missing and code is compressed to fit on this slide!):

#include "myfuncs.h" #include <math.h> ONLY if *THIS* .h file needs math.h int isquare(int i) { return i * i; } float fsquare(float f) { return f * f; } ...

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-7
SLIDE 7

176

The “st❛t✐❝” Keyword: 1

  • The keyword “static” has a number of (somewhat) dissimilar

meanings in C

  • A variable declared inside a function to be static, e.g.,

static int abc;

is not stored on the stack; rather it is stored in another area of memory so that its value is preserved between calls to that function

  • Here is a function that returns how many times it has been called:

int times_called(void) { static int counter = 0; return ++counter; }

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-8
SLIDE 8

177

The “st❛t✐❝” Keyword: 2

  • A function declared to be static, e.g.,

static int icube(int i) { ... }

is visible only to other functions in the same source file

  • If you use helper functions in a module (that you don’t want the

module user to access) declare them static – – these static functions must not be declared in the .h file

  • There is one other use of static; we will discuss this later

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-9
SLIDE 9

178

Modules: Separate Compilation

  • There are three ways to use module functions in your program (in all of

these examples, a9p5.c has “#include myfuncs.h”)

  • 1: compile the module .c file with your file:

gcc -Wall -Wextra -std=gnu11 a9p5.c myfuncs.c -o a9p5

  • 2a: pre-compile the module file:

gcc -Wall -Wextra -std=gnu11 -c myfuncs.c → this creates a “.o” file (myfuncs.o in this case)

2b: then use this pre-compiled file:

gcc -Wall -Wextra -std=gnu11 a9p5.c myfuncs.o -o a9p5

  • 3: create a “library” (“.a” or “.so”) file and name the library file when

making the program –

  • Note: the second and third methods allow a module writer to share his

functions without sharing his source code

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-10
SLIDE 10

179

Modules: Various Categories

  • The previous slides show how to create a package module

  • One type of package module is known as a type abstraction module

– this defines a new data type (e.g., a stack) and the operations for that data type (e.g., push(), pop(), . . . )

  • You can also create a layer module

  • You can also create a module which

– – replaces (over-rides) functions from another module – read about this in “C for Java Programmers”

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-11
SLIDE 11

180

Modules: Sample Layer Module

  • Suppose you only have trig functions for radians, but want to use

degrees; you could create a “trig in degrees” module:

  • Your trig_degrees.h file would have

#ifndef TRIG_DEGREES_H #define TRIG_DEGREES_H ... user documentation for sin_degrees() ... double sin_degrees(double x); ... #endif

  • Your trig_degrees.c file would have

#include "trig_degrees.h" #include <math.h> ... programmer documentation for sin_degrees() ... double sin_degrees(double x) { return sin(x * M_PI / 180.); } ...

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-12
SLIDE 12

181

Modules: Yet Another Categorization

  • Suppose you write a module to implement the stack data structure; you

might write it in one of these two ways: – any program using this module can use at most one stack, or – any program using this most can use many stacks

  • The former type of module is sometimes referred to as a singleton

module

  • The latter type of module is sometimes referred to as a reentrant

module

  • Typically, a module will be singleton if it uses static variables to store

state (information) between calls to the module’s functions – if it doesn’t use static variables to store state between calls, it will (almost certainly) be reentrant

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-13
SLIDE 13

182

Modules: Sharing Variables

  • You can define and declare variables outside of functions

– these are called global variables – they can be used to share information among different functions – these functions can be in the same file (e.g., a module) – these functions can also be in different files (e.g., a module and your main program)

  • In the other files, you declare the global variable as follows:

extern int some_variable;

  • You can restrict the usage of a global variable to one source file by

defining it to be static

static double running_total;

  • In both cases, the “scope” of the variable in its file begins at its

declaration; it is not known to the code above its declaration

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-14
SLIDE 14

183

Global Variables: Use with Caution

  • Note: the use of global variables to share information between functions

in different source files should generally be avoided – indeed, maybe I should say almost always be avoided

  • The use of global variables can decrease understandability and

maintainability of programs – if a global variable is modified in multiple places, understanding the behaviour of a program can become difficult – in particular, understanding the interfaces between functions gets more complex with global variables

  • Inside a module, a static global variable is more acceptable

  • On some computer architectures global variables are slower to access

than automatic (local) variables

GEQ for COMP 2213: Why?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-15
SLIDE 15

184

Modules: Constructor Functions

  • The implementation you choose for a particular task may require

initialization before the first use – the user of the module (probably) does not want/need to know the implementation details

  • The writer of the module can provide a function which the user calls to

initialize the module –

  • All the user needs to know is the name of the constructor function (you

could call it “new_moduleName()”) and what the required args are – this information should be in the module docs in the .h file

  • For a singleton module, this would (probably) initialize some static

variables in the module

  • For a reentrant module, this would return a pointer to some allocated,

initialized structure – if the constructor allocates memory, you would generally want to have a destructor function as well

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-16
SLIDE 16

185

Modules: Know When to Bail Out

  • It is often desirable to have “library” functions return a code to the

calling function to indicate success or failure – –

  • utput a diagnostic message (probably to stderr)

– terminate the program or not – in virtually no cases should a “library function” output error messages

  • Sometimes the error is so severe that the program should terminate

– in such cases, it may be inconvenient to pass an error code back to

main()

– there might be a chain of 50 functions that called each other, all of which would have to support this – instead, in these (rare?) circumstances, a function might want to use the exit() function – “exit(N);” is similar to main() calling “return N;”

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-17
SLIDE 17

186

Variable Scope: 1

  • Scope: when a variable is “visible” to source code
  • Automatic variables (“regular” variables declared inside a function) are
  • nly visible

(a)

int f(int a) { int i = j; /* INVALID */ int j = 3; }

(b) and inside the block in which they are declared

int g(int b) { while (1) { int k, l, m; ... } /* k, l and m are not visible here */ }

(c) (C99, C11) in a for statement which declares them

for (int i = n - 1; i >= 0 ; i--) ...

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-18
SLIDE 18

187

Variable Scope: 2

  • External variables are visible:

– after their definition in the source file where they are defined – after their declaration in any source file

  • The scope of global variables can be restricted to the current source file

with the static keyword

/* * File: xyzzy.c * ... */ /* This variable is DEFINED here. */ int a_global_visible_everywhere; /* This variable is also DEFINED here. */ static int a_global_visible_only_in_this_file; /* This var is DECLARED here, DEFINED somewhere else. */ extern int a_global_defined_elsewhere;

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-19
SLIDE 19

188

Example of Global Variable Usage

  • It is often good for an error message to indicate what program

generated the message

char * progname; int main(int argc, char * argv[]) { progname = argv[0]; ... } int do_something(...) { ... if (a_bad_thing_has_happened) { fprintf(stderr, "%s: whine...\n", progname); return BAD_THING_HAPPENED; } }

  • If do_something() is in a different source file than main(),

“extern char * progname;” must precede the usage of progname

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-20
SLIDE 20

189

Keeping an Implementation Private: Rationale

  • Suppose you create a stacks module and use a linked list

implementation –

typedef struct stack_element { int value; struct stack_element * next; } * stack_T;

  • Problem: if you put this definition into the .h file, you have exposed the

internal structure of the implementation to the module user – this means they may access the stack elements without using the appropriate functions – they may also access or modify the linked list itself, which is a Very Bad Thing – they might break the stack –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-21
SLIDE 21

190

Keeping an Implementation Private: Opaque Modules: 1

  • C allows so-called “incomplete type definitions”

struct blahblahblah * ptr is a valid declaration,

even if struct blahblahblah is not currently defined – the C compiler knows how big a pointer is, which is all it really needs to define ptr – (it also keeps track of the pointee’s type for later reference)

  • To keep a module implementation private, the module .h file should

contain (for example)

typedef struct int_stack_implementation * int_stack_T;

and the (possibly secret) .c file will have (for example)

struct int_stack_implementation { int value; struct stack_implementation * next; };

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-22
SLIDE 22

191

Keeping an Implementation Private: Opaque Modules: 2

  • Using the scheme of the previous slide, the int_stack.h file would also

include the function declarations:

#ifndef INT_STACK_H #define INT_STACK_H typedef struct stack_implementation * int_stack_T; int_stack_T new_int_stack(); int push_int_stack(int_stack_T, int); int pop_int_stack(int_stack_T, int *); int top_int_stack(int_stack_T, int *); int isempty_int_stack(int_stack_T); int destroy_int_stack(int_stack_T *); #endif

  • GEQ: why does destroy_int_stack() take a pointer to a pointer?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-23
SLIDE 23

192

Enumerated Types

  • Suppose you want a “small” set of defined constants, e.g.,

– – a discrete set of values like SUNDAY, MONDAY, . . . , SATURDAY

  • You could

#define SUNDAY 1 ... #define SATURDAY 7

but this is tedious and error-prone

  • Instead, do

enum { SUNDAY=1, MONDAY, ..., SATURDAY };

which #defines SUNDAY to 1, MONDAY to 2, TUESDAY to 3, and so on

  • By default, the first constant is #defined to 0, the second to 1, . . .

  • (Of course, you can’t use “. . . ” in the actual statement)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-24
SLIDE 24

193

Enumerated Types: 2

  • Alternatively, you can create a named enumerated type

enum weekdays { SUNDAY=1, MONDAY, ..., SATURDAY };

  • r even use typedef to create a new type

typedef enum { SUNDAY=1, MONDAY, ..., SATURDAY } weekdays_T;

  • Then, if desired, you can say

enum weekdays day;

  • r (respectively)

weekdays_T day; ... day = THURSDAY;

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-25
SLIDE 25

194

Binary vs. ASCII (Text) Data

  • So far we have used only text data for I/O
  • Text advantages:

– human-readable (more or less) – can use any text-based tools – editors (emacs, vim, . . . , all the way down to toy editors) – – browsing tools (less, more, most, . . . ) – comparison tools (diff, diff3, xxdiff, . . . ) – text manipulation (sed, awk, tr, sort, uniq, . . . ) – portable (to non-EBCDIC computers, anyway)

  • Text disadvantages:

– representation of numeric data (typically) uses more space – converting numeric data from ASCII to in-memory (binary) format – is time-consuming, and –

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-26
SLIDE 26

195

Addressing of Multi-Byte Binary Data: “The Endian Wars”

  • Suppose you have a 32-bit int on our computers

– – say, for example, locations 100, 101, 102, and 103

  • Q: where does the least-significant byte (LSB) go?
  • Two possibilities:

– most significant byte (MSB) in location 100, LSB in location 103 – this is known as big-endian — the big end comes first – LSB in location 100, MSB in location 103 – this is known as little-endian — the little end comes first

  • Reference: Gullliver’s Travels (1726) by Jonathan Swift (1667-1745)
  • Don’t even ask about PDP-endian!

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-27
SLIDE 27

196

Big-Endian vs. Little-Endian So What?

  • Suppose you write binary data on one computer

– suppose you later try to read it on another computer – if the endian-ness (also known as the byte sex) of the computers is different, you will read gibberish

  • Intel i386 machines (and compatibles) are little-endian
  • Many (most?) other computer architectures are big-endian
  • The internet is considered to be big-endian

– i.e., the default assumed format for binary data sent across the internet is big-endian

  • So: if you want your data to be portable, either

– – make it a Well Known Fact that the data format defines the data to be in little-endian format, or – have “meta-data” with the data file that specifies the endian-ness

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-28
SLIDE 28

197

Binary I/O Using st❞✐♦

  • There are two stdio functions to do binary I/O

fread() and fwrite()

  • Note that on non-Unix-like systems, to do binary I/O you should

fopen() your file with “rb” or “wb” (it is redundant but ok to do this

  • n Unix-like systems too)
  • The calling sequence of fread() is as follows:

size_t fread(const void * ptr, size_t size, size_t nmemb, FILE * stream);

where –

ptr is a pointer to a block of memory large enough to hold the

desired data –

size is the number of bytes used by each data item to be read in

nmemb is the number of data items to read

stream is the file descriptor from which to read data

  • fread() returns the number of items (NOT BYTES) successfully read

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-29
SLIDE 29

198

Binary I/O Using st❞✐♦: ❢✇r✐t❡✭✮

  • The calling sequence of fwrite() is as follows:

size_t fwrite(const void * ptr, size_t size, size_t nmemb, FILE * stream);

– writes size * nmemb bytes of data to stream from a block of memory pointed to by ptr

  • fwrite() returns the number of items (NOT BYTES) successfully

written

  • On x86-64 computers, using gcc, size_t is (indirectly) defined by

typedef long unsigned int size_t;

– you can discover this yourself by typing

gcc -E XYZ.c | grep ’typedef .* size_t’

where XYZ.c is any C program which includes stdio.h –

  • r (maybe!) by laboriously going through files in /usr/include
  • The sizeof() feature of C returns (something compatible with) a

size_t

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-30
SLIDE 30

199

Binary I/O Using st❞✐♦: Examples

  • Suppose you want to write n integers from an array arr:

ret = fwrite(arr, sizeof(int), n, outfile);

– after the call to fwrite(), ret will contain the number of ints successfully written

  • Suppose you want to read some data into a structure st of type struct

my_struct:

ret = fread(&st, sizeof(struct my_struct), 1, infile);

  • r maybe better

Why better?

ret = fread(&st, sizeof(st), 1, infile);

– after the call to fread(), ret will have the number of structs successfully read (0 or 1 in this case)

  • Ugly-ism: to differentiate between EOF and error, you must use feof()

and/or ferror() – see the man pages for details

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-31
SLIDE 31

200

Random Access to File Data

  • Often, you read or write a file from beginning to end
  • Sometimes, however, you want to “jump around” in the file

– e.g., if you have a file with 1,000,000 student records, you probably don’t want to read all 1,000,000 if you know the one you want is at the end

  • To do random access to a file, you use “r+”, “w+” or “a+” in your

fopen() call (“a” means “append”)

  • The fseek() call is used to “jump around”:

int fseek(FILE * stream, long offset, int whence);

where –

  • ffset is how far (in bytes) you want to jump, and

whence is the starting location of your jump; it can be

– SEEK_SET (offset is from the start of the file), – SEEK_CUR (offset is from the current location), or – SEEK_END (offset is from the end of the file) – note that offset can be negative, to go backwards

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-32
SLIDE 32

201

Random Access Example

  • Suppose

student_db_fd is a FILE * which has been opened in r+ mode

– the data in the file is a sequence of struct student_rec –

s_r is a variable of type struct student_rec

– we want to read the 5000th record into s_r

  • We can use the following code to accomplish this:

ret = fseek(student_db_fd, 4999 * sizeof(s_r), SEEK_SET); if (ret != 0) {

whine whine whine

} else { ret = fread(&s_r, sizeof(s_r), 1, student_db_fd); if (ret != 1) ... }

  • Another example: prepare to read the very last record:

ret = fseek(student_db_fd, -1 * sizeof(s_r), SEEK_END);

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-33
SLIDE 33

202

Related Random Access Functions

  • Suppose you want to know where you are in the file right now:

long ftell(FILE * stream);

returns (on success!) the offset from the beginning of the file – so if you want to save a “bookmark” into the file for wherever you are now, you can do something like

next_one_to_read = ftell(student_db_fd);

  • If you want to return to the beginning of the file, you can use

rewind(student_db_fd);

which is equivalent to

fseek(student_db_fd, 0L, SEEK_SET);

  • GEQ: how can a C program use these functions to find out how long a

file is (without reading the whole thing)?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-34
SLIDE 34

203

I/O Without ❙t❞✐♦

  • The stdio routines provide a high-level (?) interface implemented using

lower-level functions – i.e., stdio library functions use system calls to do the actual I/O – recall(?): in most OSes, a user program can not do I/O. . . instead, the program asks the OS to do the I/O for it

  • stdio buffers input and output (when possible) to minimize the number
  • f times the OS is called

– function calls are expensive, system calls much more so!

  • Aside: stdout is buffered, but stderr is not

– use fflush(stdout) to flush stdout’s buffer

  • stdio is standard C, thus portable to other systems

– low-level I/O is more system-dependent – we will look at Unix (and thus Linux, . . . ) I/O

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-35
SLIDE 35

204

Low Level I/O Functions: 1

  • pen() opens a file:

int open(const char * pathname, int flags); int open(const char * pathname, int flags, mode_t mode);

– e.g., fd = open("/etc/passwd", O_RDONLY) –

fd is the “file descriptor”: an int, not a FILE *

– the first arg is a const char * specifying the pathname – the second arg is a bitwise-or of a number of flags – the optional(!) third arg is the file mode (“permissions”) – returns a new file descriptor on success, −1 on failure

  • close() closes a file:

int close(int fd);

– e.g., ret = close(fd); – returns zero on success, non-zero on failure

  • When a program begins running, unless something is very, very b0rken,

fd 0 is stdin, fd 1 is stdout and fd 2 is stderr – check out fileno(stdin) or fileno(stdout)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-36
SLIDE 36

205

Low Level I/O Functions: 2

  • ssize_t read(int fd, void * buf, size_t count);

ssize_t is an int on 32-bit Linux systems

long int on 64-bit systems

count is a number of bytes to (try to) read

buf is a pointer to a block of memory big enough to hold count

bytes of data – e.g., int i; ... ret = read(fd, &i, sizeof(i)); (attempts to) read an int (in binary) from file descriptor fd

  • ssize_t write(int fd, const void * buf, size_t count);

– e.g., ret = write(1, "string\n", 7);

  • utputs “string\n” to stdout†
  • The pointer is a “generic” pointer: it can point to any data type

– so you can read/write chars, ints, floats, structs, . . . – no conversion between ASCII and binary is done

† Unless someone has been messing around with file descriptors

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-37
SLIDE 37

206

Low Level I/O Functions: Sample Program

#include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <stdlib.h> #define BUF_SIZE 4096 int main() { char buf[BUF_SIZE]; int fd, ret, ret2; fd = open("/proc/cpuinfo", O_RDONLY); if (fd < 0) { whine whine whine AND EXIT } ret = read(fd, buf, BUF_SIZE); /* ret is # chars read */ if (ret < 0) { whine whine whine AND EXIT } ret2 = write(1, buf, ret); if (ret2 != ret) { whine whine whine AND EXIT } close(fd); // sort of redundant here, but good style return EXIT_SUCCESS; }

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-38
SLIDE 38

207

Bit Operations: 1

  • C provides operations to act on single bits:

x & y does a bitwise and of x and y

x | y does a bitwise or of x and y

x ^ y does a bitwise xor (exclusive or) of x and y

~y does a bitwise complement of y

  • One use: packing multiple 1-bit “fields” into one int

– e.g., information returned from system calls

  • Cryptography algorithms: moving pieces of data around

– real cryptography algorithms – “poor-man’s” cryptography:

a ^ (a ^ b) == b

– see memfrob() man page

  • Data compression:

– some techniques substitute variable-length bit strings for (fixed-length) symbols

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-39
SLIDE 39

208

Bit Operations: 2

  • a << n shifts the bits of a left n places

– it “fills” with 0’s at right end

  • a >> n shifts the bits of a right n places

– it fills with (a >> 31) & 1 at left end (if sizeof(a) is 4) –

(a >> 31) & 1 is the leftmost (most significant) bit of a

(if a is stored in 32 bits) – the most significant bit of a signed int is the “sign bit” – 0 if the int is ≥ 0, 1 if the int is < 0

  • Iff there is no arithmetic overflow

a << n is like multiplying a by 2n, but faster

a >> n is like dividing a by 2n, but faster

– you may see this in some programs where people are concerned about writing fast code

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-40
SLIDE 40

209

Fun With Bit Operations: 1

  • Note: some of these depend on the machine using 2’s-complement

arithmetic

  • Some of these don’t work (or don’t work as you might expect) if applied

to negative numbers

  • The following macro uses an expression which rounds x up to the next

multiple of n, for n a power of 2:

#define ROUNDUP(x,n) (((x) + (n) - 1) & ~((n) - 1))

  • This checks whether the nth bit of a number a is set:

if (a & (1 << n)) ...

  • Set / clear the nth bit of a:

a |= (1 << n); ... a &= ~(1 << n);

  • Toggle the nth bit of a:

a ^= (1 << n);

  • Clear the rightmost 1 bit of a:

a &= a - 1;

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-41
SLIDE 41

210

Fun With Bit Operations: 2

  • Clear all bits except the rightmost 1 bit of a:

a &= -a;

  • Set the rightmost 0 bit:

a ¯ a + 1;

  • See if a and b have the same sign:

(a ^ b) >= 0

  • Swap a and b without a temporary variable:

a ^= b; b ^= a; a ^= b;

  • And many, many more;

http://graphics.stanford.edu/~seander/bithacks.html has lots

  • Challenge 1: Write a C program to find the smallest of three integers,

without using any of the comparision operators.

  • Challenge 2: Write a C function which does the addition of two integers

without using the ’+’ operator; you can use only the bit operators.

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-42
SLIDE 42

211

Pointer Arithmetic

  • There are certain arithmetic operations that make sense on pointers

– assume p1 and p2 are pointers

  • Add an integer to a pointer: p1 + 2

– this is a pointer to the item (of type *p1) 2 units further down the block – you should not go past the end of the block!

  • Difference of two pointers p1 and p2

– assumption: p1 and p2 point to the same block –

p2 - p1 is how many items after p1 where p2 is found p1 p2 p2 - p1 = 4

  • Multiplying a pointer by a constant or a pointer makes no sense
  • Dividing a pointer by a constant or a pointer makes no sense
  • Adding two pointers together makes no sense

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-43
SLIDE 43

212

Pointers to Pointers

  • Given a pointer to a pointer to an int (say), two dereferences are

required to get the int: ipp −

→ ip − → i

  • E.g., argv is a pointer to a block (“array”) of pointers

argv −

→ argv[0] − → progname

argv[1] −

→ arg1

argv[2] −

→ arg2

argv[3] −

→ arg3

  • One common usage for a pointer to a pointer:

– suppose a “constructor” function should allocate a structure and return a pointer to the new structure – it could return the pointer via the return value, or – it could return the pointer via a parameter – for this to work, the parameter would need to be declared as a pointer to a pointer

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-44
SLIDE 44

213

Generic Pointers: 1

  • In some cases, you would like a pointer to be able to refer to any type
  • f object
  • For example, malloc() is declared to be

void * malloc(size_t size)

  • A void pointer can be cast into any (more specific) pointer:

char * s = malloc(42); int * ip = malloc(sizeof(int));

– note that you don’t need to say

char * s = (char *)malloc(42);

although you may do so if you wish

  • You can also say void * p; ... p = ip;
  • This is vaguely similar to the Object object in Java:

Object o; String s = new String();

  • = s;

// valid for any type of object s

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-45
SLIDE 45

214

Generic Pointers: 2

  • Not all of the usual pointer operations can be performed, or performed

meaningfully, on generic pointers:

int ia[] = {1, 2, 3, 4}; int * ip = ia; void * p; int j; char c; p = ip; j = *p; /* Not valid */ j = *(int *)p; /* Perfectly fine */ c = *(char *)p; /* Dirty trick to get first byte of *ip */ p++; /* gcc allows it, but only adds 1 to p */

  • GEQ: On a little-endian computer, is c equal to 0 or 1?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-46
SLIDE 46

215

Generic Pointers: Sorting Anything

  • Generic pointers can be used to allow functions to deal with pointers to

any type of “objects”

  • Consider the library sorting function qsort()

void qsort(void * base, size_t nmemb, size_t size, int (*compar)(const void *, const void *));

base points to the beginning of an nmemb-element array of

“objects”, where the sizeof each “object” is size bytes –

compar() is a function which compares two “objects”

– see man qsort for the specifications

  • With appropriate choice of compar(), qsort() can be used to sort any

type of “object” – the “objects” could be structs, ints, floats, strings, . . . – the comparison function must return an integer less than, equal to,

  • r greater than zero if the first argument is considered to be

respectively less than, equal to, or greater than the second. – GEQ: does strcmp() do this?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-47
SLIDE 47

216

Generic Pointers: Sorting Points in 2-D Space: 1

  • Suppose you represent points as a 2-D array of doubles,

and you wish to sort points from top to bottom, and left to right for points with equal Y-coordinates

int compare_2d_pts(const void * pt1, const void * pt2) { if (((double *)pt1)[1] > ((double *)pt2)[1]) return -1; if (((double *)pt1)[1] < ((double *)pt2)[1]) return 1; if (((double *)pt1)[0] < ((double *)pt2)[0]) return -1; if (((double *)pt1)[0] == ((double *)pt2)[0]) return 0; else return 1; }

  • You need to cast the args to double * before dereferencing them

(in this case the array indexing is the dereferencing)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-48
SLIDE 48

217

Generic Pointers: Sorting Points in 2-D Space: 2

  • Sample main() to test compare_2d_pts()

#include <stdio.h> #include <stdlib.h> int main() { double pts[][2] = {{2, 3}, {1, 4}, {3, 2}, {4, 1}, {2, 1}, {0, 0}, {-1, -1}, {-1, 2}, {3, -2}, {6, 6}}; int n_pts = sizeof(pts) / sizeof(pts[0]); for (int i = 0; i < n_pts; i++) printf("[%.1f, %.1f]\n", pts[i][0], pts[i][1]); printf("\n"); qsort(pts, n_pts, 2 * sizeof(double), compare_2d_pts); for (int i = 0; i < n_pts; i++) printf("[%.1f, %.1f]\n", pts[i][0], pts[i][1]); return EXIT_SUCCESS; }

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-49
SLIDE 49

218

Generic Pointers: Accessing the n-th Element of a Block

  • Consider qsort()

void qsort(void * base, size_t nmemb, size_t size, int (*compar)(const void *, const void *));

  • Q: how does qsort() access a given element?
  • A: it knows

– the address of the start of the block – the size of each “object” in the block – how to add and multiply

  • To get the n-th element,

– compute the address:

(char *)base + n * size

– this is a pointer to the n-th “object” in the block starting at base

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-50
SLIDE 50

219

Portability The Hard Way: 1

  • Careful coding will allow most C programs to compile and run on a wide

variety of systems – but dealing with differences between different operating systems and/or hardware platforms might require extra work

  • Occasionally (but rarely) some #include files are named inconsistently

– e.g., string.h is (was for a while?) strings.h in some OSes

  • More frequent (but getting better?): system calls (which are different

than standard C library functions) may need different #include files

  • Solution (it is often nice to hide these in your own .h file):

#ifdef __gnu_linux__ #include <name-under-linux.h> #define <some appropriate #define for Linux> #elif __FreeBSD__ #include <name-under-FreeBSD.h> #define <some appropriate #define for FreeBSD> #else ... #endif

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-51
SLIDE 51

220

Portability The Hard Way: 2

  • Endian-problems: careful programming can often solve this

– library functions convert from local machine byte order to network byte order (and vice versa) –

uint32_t htonl(uint32_t hostlong);

– the htonl() function converts the unsigned 32-bit integer

hostlong from host byte order to network byte order

– see also htons(), ntohl(), ntohs() – don’t forget to use these when doing “network programming”!

  • When doing binary I/O this endian-ness must be considered

– you can use htonl() and friends to convert – must also consider that long may be different than int – e.g., 64-bit processors

  • Calling a 32-bit quantity a “long” is archaic

– but not quite as bad as M$ calling a 32-bit number a DWORD (“double word”)

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-52
SLIDE 52

221

Portability The Hard Way: 3

  • Problem: data types whose size might change

– e.g., a long might be 32 bits or 64 bits on your laptop, depending

  • n whether you are running in 32-bit mode or 64-bit mode

– thus (for example) a int64_t might be a long long or just a long – but: printf() uses %ld for a long, but %lld for a long long

  • Ugly(?) solution:

#include <inttypes.h>

– to print a uint64_t variable ul64, use a statement like

printf("... %" PRIu64 " ...", ..., ul64, ...);

– recall: the C preprocessor automagically “glues together” adjacent string constants

  • Aside from PRInting macros, there are SCaNning macros (e.g., SCNu16)
  • Aside from 64-bit values, there are macros for 8-, 16- and 32-bit values
  • Aside from “u” (unsigned), there are macros for “d”, “i”, “o”, “x” and

(for printing only) “X”

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-53
SLIDE 53

222

Deep and Shallow Copying of Structures: 1

  • Suppose you have

struct v struct x { { int val; int age; struct x * x_var; char name[101]; } v1, v2; } x1; x1.age = 121; strncpy(x1.name, "John. Q. Woodcutter", 100); v1.val = 42; v1.x_var = &x1;

  • At this point, v1 and x1 are both completely “set up”.

printf("%d:%d", v1.val, v1.x_var->age); would output “42:121”

  • If I say

v2 = v1; v1.val = 99; v1.x_var->age = 0;

then –

printf("%d:%d", v1.val, v1.x_var->age); would output “99:0”

printf("%d:%d", v2.val, v2.x_var->age); would output what?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-54
SLIDE 54

223

Deep and Shallow Copying of Structures: 2

  • Given the code from the previous slide,

printf("%d:%d", v2.val, v2.x_var->age);

would output “42:0”

  • v2.val retains the value it received in the assignment “v2 = v1;”
  • Although v2.x_var->age was not explicitly changed, it is nonetheless

changed

  • Reason:

v2.x_var points to the same item as v1.x_var

v1.x_var was changed with the statement v1.x_var->age = 0;

  • Assignment of structs in C produces a so-called “shallow copy”

– the struct data is copied, but copies of “pointees” are not made

  • Duplicating the items pointed to, and things they point to, and so on

and so on, is known as a “deep copy”

  • If you need a deep copy, in C you have to code all of it yourself

– how about Java?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-55
SLIDE 55

224

Unions: Motivation

  • Consider the array of structs

struct expression_tree_node { int type; int int_value; float float_value; double double_value; char operation; } nodes[10000];

  • Suppose you never need to store more than one of int_value,

float_value, double_value or operation at a given time

type is used to indicate which one is currently being stored

  • Suppose you want to minimize memory usage

– the above struct uses (in total) 24 (or 32) bytes (32-bit/64-bit) – 24?? 32?? Eh?? GEQ: why is that? – but at most 8 data bytes + 4 bytes for type are needed

  • Unions provide a facility to avoid wasting this space

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-56
SLIDE 56

225

Unions: Example

  • Replace the previous definition with

struct expression_tree_node { int type; union { int int_value; float float_value; double double_value; char operation; } value; } nodes[10000];

  • The union only occupies as much space as its largest item (8 bytes in

this case)

  • Access syntax: nodes[123].value.int_value;
  • Dirty trick: you can look at the same bit pattern as a float, int, . . .

by storing a value into one union field and then accessing it via another field

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-57
SLIDE 57

226

Calling Other Programs from C

  • Here are four ways to do this:

(1) int status = system("shell command");

int exit_code = WEXITSTATUS(status); // if status != -1

– call sh to run “shell command” and get the exit status (which could either be the exit code or “signal” information) (2) FILE * infile = popen("shell command", "r"); – call sh to run “shell command” and read any stdout output from that command via infile (3) FILE * outfile = popen("shell command", "w"); – call sh to run “shell command” and send data to that command’s stdin via outfile (4a) fork() + exec*() – create a whole new process on your system (4b) pipe() + fork() + exec*() – create a whole new process on your system that your current process can “talk” to.

Good COMP 3713 assignment question.

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-58
SLIDE 58

227

Unix: Privileged and Unprivileged Users

  • Users in Unix-type systems have a numeric user id (UID) and a numeric

group id (GID) associated with each userid – it is these numbers that determine permissions for accessing programs – use the id program (or peruse /etc/passwd and /etc/group) to see your userid and which groups you belong to – use ls -n to see the UID and GID of files

  • In Unix-type systems, root (UID 0) is a privileged user

root can read/write any files on the system†, regardless of the

permission bits

  • An “ordinary” users can only execute programs, read files, or write files

according to the file and directory permission bits and their UID and GID

  • Q: how can an unprivileged user do privileged things?

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-59
SLIDE 59

228

Executing A Program with Other Permissions: 1

  • In order to do privileged operations (such as system administration

tasks) an ordinary user must gain “root permissions”

  • This is gone with the so-called setuid and setgid bits:

$ ls -l /usr/bin/sudo

  • rws--x--x 1 root root 117840 Feb 10

2015 /usr/bin/sudo*

  • Note the “rws”: the “s” means that when this program is executed, it

takes on the permissions associated with the owner of the file, not the user executing it – so the sudo program runs with root permissions!

  • Similarly, an “s” in the group permissions’ execute bit means the

program takes on the permissions associated with that group:

  • rwsr-sr-x 1 daemon daemon 50456 Jul 28

2010 /usr/bin/at

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-60
SLIDE 60

229

Executing A Program with Other Permissions: 2

  • You can give your own programs setuid/setgid permissions so that other

users executing your program get your permissions:

$ chmod 4755 myprog # myprog will get setuid bit set

  • Giving a program setuid or setgid perms is a potential security risk and

must be done with great care – subtle bugs in your program could allow attackers to – delete files on your system – crash your system – run their own programs on your system –

. . .

Jim Diamond, Jodrey School of Computer Science, Acadia University

slide-61
SLIDE 61

230

Test Your Knowledge of C

  • “C Puzzles” http://www.gowrikumar.com/c/index.php

– some explanations:

http://codeitdown.com/c-puzzles-answered/

  • “C Puzzles”

https://chortle.ccsu.edu/CPuzzles/CPuzzlesMain.html

  • Most C++ puzzles, but a few are in C:

http://www.geeksforgeeks.org/category/c-puzzles/

  • Interview C Puzzles: https://vasanthexperiments.wordpress.com/

2011/08/31/interview-c-puzzles/

  • Lots more: do a web search on “C puzzles”
  • Some places to test your programming skills:

http://www.programming-challenges.com https://uva.onlinejudge.org/ https://leetcode.com/ https://www.coderbyte.com/ https://www.codewars.com/

Jim Diamond, Jodrey School of Computer Science, Acadia University