COMP 2103 Programming 3 Part 4 Jim Diamond CAR 409 Jodrey School - - PowerPoint PPT Presentation
COMP 2103 Programming 3 Part 4 Jim Diamond CAR 409 Jodrey School - - PowerPoint PPT Presentation
COMP 2103 Programming 3 Part 4 Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University 171 Modules: Introduction Modules: a technique for organizing your functions Idea 1: if you have a large program, you
171
Modules: Introduction
- Modules: a technique for organizing your functions
–
- Idea 1: if you have a large program, you don’t want all your functions in
- ne file
imagine 1,000,000 lines of code in one file
– changing one line means you would have to re-compile everything – editing the file might become unwieldy – – re-using code is difficult
- Idea 2: suppose you develop your own library of (related) functions,
– – you might want people to be able to use your library functions without giving them your source code
Jim Diamond, Jodrey School of Computer Science, Acadia University
172
Modules: Organization
- A module consists of two files:
– a “.c” file, which contains the implementation of the module functions, and – a “.h” file, which contains declarations of the “public” module functions, and other things (see next slide) –
- The programmer creating the module prepares these two files
(e.g., some_package.c and some_package.h)
- The programmer using the module uses
#include "some_package.h"
in any of his source files which need declarations from that module
- The programmer using the module includes the implementation file† in
his/her gcc line:
$ gcc -Wall ... myprog.c some_package.c -o myprog
Jim Diamond, Jodrey School of Computer Science, Acadia University
173
Modules: Content of The “✳❤” File
- The .h file contains all interface information which should be exposed
to the module user – the documentation for the user – any #define constants or macros needed by the user – any data types created with typedef or struct needed by the user – any function declarations that the user should see – there may be functions in the implementation of the module which are not exposed to the user (“private” functions)
- Example: math.h
–
# define M_PI 3.14159265358979323846 /* pi */
– has macros; e.g.,
# define isless(x, y) __builtin_isless(x, y)
– provides function prototypes (for sin(), cos(), . . . )
- Some .h files also define typedefs and/or structs
Jim Diamond, Jodrey School of Computer Science, Acadia University
174
Modules: Structure of The “✳❤” File
- Suppose your module is named myfuncs
– the myfuncs.h file MUST (by convention, but still MUST) look like this
#ifndef MYFUNCS_H #define MYFUNCS_H ... all the macros, function prototypes, ... #endif
- This #ifndef ... #define ... #endif construct tells the compiler to
skip the contents of the file on second (and third, fourth, . . . ) readings
- Why do we care?
– because in some (simple or complex) situations you might end up
#includeing a .h file twice;
- So what?
– the compiler doesn’t allow you to do some things twice (like
#defineing the same token twice)
– and just because you have a fast computer doesn’t mean you should waste CPU time
C++ (bah!)
Jim Diamond, Jodrey School of Computer Science, Acadia University
175
Modules: The “✳❝” File
- First, the .c file #includes the corresponding .h file
– this ensures that the declarations (of externally-visible functions) are in agreement with the function definitions – (if they disagree the compiler will bleed over you when you try to compile the .c file!) – this also provides the .c file with any constant or macro definitions found in the .h file
- Following the #include is the rest of the implementation of the module
functions (including the documentation for the implementation)
- For example (docs are missing and code is compressed to fit on this slide!):
#include "myfuncs.h" #include <math.h> ONLY if *THIS* .h file needs math.h int isquare(int i) { return i * i; } float fsquare(float f) { return f * f; } ...
Jim Diamond, Jodrey School of Computer Science, Acadia University
176
The “st❛t✐❝” Keyword: 1
- The keyword “static” has a number of (somewhat) dissimilar
meanings in C
- A variable declared inside a function to be static, e.g.,
static int abc;
is not stored on the stack; rather it is stored in another area of memory so that its value is preserved between calls to that function
- Here is a function that returns how many times it has been called:
int times_called(void) { static int counter = 0; return ++counter; }
Jim Diamond, Jodrey School of Computer Science, Acadia University
177
The “st❛t✐❝” Keyword: 2
- A function declared to be static, e.g.,
static int icube(int i) { ... }
is visible only to other functions in the same source file
- If you use helper functions in a module (that you don’t want the
module user to access) declare them static – – these static functions must not be declared in the .h file
- There is one other use of static; we will discuss this later
Jim Diamond, Jodrey School of Computer Science, Acadia University
178
Modules: Separate Compilation
- There are three ways to use module functions in your program (in all of
these examples, a9p5.c has “#include myfuncs.h”)
- 1: compile the module .c file with your file:
gcc -Wall -Wextra -std=gnu11 a9p5.c myfuncs.c -o a9p5
- 2a: pre-compile the module file:
gcc -Wall -Wextra -std=gnu11 -c myfuncs.c → this creates a “.o” file (myfuncs.o in this case)
2b: then use this pre-compiled file:
gcc -Wall -Wextra -std=gnu11 a9p5.c myfuncs.o -o a9p5
- 3: create a “library” (“.a” or “.so”) file and name the library file when
making the program –
- Note: the second and third methods allow a module writer to share his
functions without sharing his source code
Jim Diamond, Jodrey School of Computer Science, Acadia University
179
Modules: Various Categories
- The previous slides show how to create a package module
–
- One type of package module is known as a type abstraction module
– this defines a new data type (e.g., a stack) and the operations for that data type (e.g., push(), pop(), . . . )
- You can also create a layer module
–
- You can also create a module which
– – replaces (over-rides) functions from another module – read about this in “C for Java Programmers”
Jim Diamond, Jodrey School of Computer Science, Acadia University
180
Modules: Sample Layer Module
- Suppose you only have trig functions for radians, but want to use
degrees; you could create a “trig in degrees” module:
- Your trig_degrees.h file would have
#ifndef TRIG_DEGREES_H #define TRIG_DEGREES_H ... user documentation for sin_degrees() ... double sin_degrees(double x); ... #endif
- Your trig_degrees.c file would have
#include "trig_degrees.h" #include <math.h> ... programmer documentation for sin_degrees() ... double sin_degrees(double x) { return sin(x * M_PI / 180.); } ...
Jim Diamond, Jodrey School of Computer Science, Acadia University
181
Modules: Yet Another Categorization
- Suppose you write a module to implement the stack data structure; you
might write it in one of these two ways: – any program using this module can use at most one stack, or – any program using this most can use many stacks
- The former type of module is sometimes referred to as a singleton
module
- The latter type of module is sometimes referred to as a reentrant
module
- Typically, a module will be singleton if it uses static variables to store
state (information) between calls to the module’s functions – if it doesn’t use static variables to store state between calls, it will (almost certainly) be reentrant
Jim Diamond, Jodrey School of Computer Science, Acadia University
182
Modules: Sharing Variables
- You can define and declare variables outside of functions
– these are called global variables – they can be used to share information among different functions – these functions can be in the same file (e.g., a module) – these functions can also be in different files (e.g., a module and your main program)
- In the other files, you declare the global variable as follows:
extern int some_variable;
- You can restrict the usage of a global variable to one source file by
defining it to be static
static double running_total;
–
- In both cases, the “scope” of the variable in its file begins at its
declaration; it is not known to the code above its declaration
Jim Diamond, Jodrey School of Computer Science, Acadia University
183
Global Variables: Use with Caution
- Note: the use of global variables to share information between functions
in different source files should generally be avoided – indeed, maybe I should say almost always be avoided
- The use of global variables can decrease understandability and
maintainability of programs – if a global variable is modified in multiple places, understanding the behaviour of a program can become difficult – in particular, understanding the interfaces between functions gets more complex with global variables
- Inside a module, a static global variable is more acceptable
–
- On some computer architectures global variables are slower to access
than automatic (local) variables
GEQ for COMP 2213: Why?
Jim Diamond, Jodrey School of Computer Science, Acadia University
184
Modules: Constructor Functions
- The implementation you choose for a particular task may require
initialization before the first use – the user of the module (probably) does not want/need to know the implementation details
- The writer of the module can provide a function which the user calls to
initialize the module –
- All the user needs to know is the name of the constructor function (you
could call it “new_moduleName()”) and what the required args are – this information should be in the module docs in the .h file
- For a singleton module, this would (probably) initialize some static
variables in the module
- For a reentrant module, this would return a pointer to some allocated,
initialized structure – if the constructor allocates memory, you would generally want to have a destructor function as well
Jim Diamond, Jodrey School of Computer Science, Acadia University
185
Modules: Know When to Bail Out
- It is often desirable to have “library” functions return a code to the
calling function to indicate success or failure – –
- utput a diagnostic message (probably to stderr)
– terminate the program or not – in virtually no cases should a “library function” output error messages
- Sometimes the error is so severe that the program should terminate
– in such cases, it may be inconvenient to pass an error code back to
main()
– there might be a chain of 50 functions that called each other, all of which would have to support this – instead, in these (rare?) circumstances, a function might want to use the exit() function – “exit(N);” is similar to main() calling “return N;”
Jim Diamond, Jodrey School of Computer Science, Acadia University
186
Variable Scope: 1
- Scope: when a variable is “visible” to source code
- Automatic variables (“regular” variables declared inside a function) are
- nly visible
(a)
int f(int a) { int i = j; /* INVALID */ int j = 3; }
(b) and inside the block in which they are declared
int g(int b) { while (1) { int k, l, m; ... } /* k, l and m are not visible here */ }
(c) (C99, C11) in a for statement which declares them
for (int i = n - 1; i >= 0 ; i--) ...
Jim Diamond, Jodrey School of Computer Science, Acadia University
187
Variable Scope: 2
- External variables are visible:
– after their definition in the source file where they are defined – after their declaration in any source file
- The scope of global variables can be restricted to the current source file
with the static keyword
/* * File: xyzzy.c * ... */ /* This variable is DEFINED here. */ int a_global_visible_everywhere; /* This variable is also DEFINED here. */ static int a_global_visible_only_in_this_file; /* This var is DECLARED here, DEFINED somewhere else. */ extern int a_global_defined_elsewhere;
Jim Diamond, Jodrey School of Computer Science, Acadia University
188
Example of Global Variable Usage
- It is often good for an error message to indicate what program
generated the message
char * progname; int main(int argc, char * argv[]) { progname = argv[0]; ... } int do_something(...) { ... if (a_bad_thing_has_happened) { fprintf(stderr, "%s: whine...\n", progname); return BAD_THING_HAPPENED; } }
- If do_something() is in a different source file than main(),
“extern char * progname;” must precede the usage of progname
Jim Diamond, Jodrey School of Computer Science, Acadia University
189
Keeping an Implementation Private: Rationale
- Suppose you create a stacks module and use a linked list
implementation –
typedef struct stack_element { int value; struct stack_element * next; } * stack_T;
- Problem: if you put this definition into the .h file, you have exposed the
internal structure of the implementation to the module user – this means they may access the stack elements without using the appropriate functions – they may also access or modify the linked list itself, which is a Very Bad Thing – they might break the stack –
Jim Diamond, Jodrey School of Computer Science, Acadia University
190
Keeping an Implementation Private: Opaque Modules: 1
- C allows so-called “incomplete type definitions”
–
struct blahblahblah * ptr is a valid declaration,
even if struct blahblahblah is not currently defined – the C compiler knows how big a pointer is, which is all it really needs to define ptr – (it also keeps track of the pointee’s type for later reference)
- To keep a module implementation private, the module .h file should
contain (for example)
typedef struct int_stack_implementation * int_stack_T;
and the (possibly secret) .c file will have (for example)
struct int_stack_implementation { int value; struct stack_implementation * next; };
Jim Diamond, Jodrey School of Computer Science, Acadia University
191
Keeping an Implementation Private: Opaque Modules: 2
- Using the scheme of the previous slide, the int_stack.h file would also
include the function declarations:
#ifndef INT_STACK_H #define INT_STACK_H typedef struct stack_implementation * int_stack_T; int_stack_T new_int_stack(); int push_int_stack(int_stack_T, int); int pop_int_stack(int_stack_T, int *); int top_int_stack(int_stack_T, int *); int isempty_int_stack(int_stack_T); int destroy_int_stack(int_stack_T *); #endif
- GEQ: why does destroy_int_stack() take a pointer to a pointer?
Jim Diamond, Jodrey School of Computer Science, Acadia University
192
Enumerated Types
- Suppose you want a “small” set of defined constants, e.g.,
– – a discrete set of values like SUNDAY, MONDAY, . . . , SATURDAY
- You could
#define SUNDAY 1 ... #define SATURDAY 7
but this is tedious and error-prone
- Instead, do
enum { SUNDAY=1, MONDAY, ..., SATURDAY };
which #defines SUNDAY to 1, MONDAY to 2, TUESDAY to 3, and so on
- By default, the first constant is #defined to 0, the second to 1, . . .
–
- (Of course, you can’t use “. . . ” in the actual statement)
Jim Diamond, Jodrey School of Computer Science, Acadia University
193
Enumerated Types: 2
- Alternatively, you can create a named enumerated type
enum weekdays { SUNDAY=1, MONDAY, ..., SATURDAY };
- r even use typedef to create a new type
typedef enum { SUNDAY=1, MONDAY, ..., SATURDAY } weekdays_T;
- Then, if desired, you can say
enum weekdays day;
- r (respectively)
weekdays_T day; ... day = THURSDAY;
Jim Diamond, Jodrey School of Computer Science, Acadia University
194
Binary vs. ASCII (Text) Data
- So far we have used only text data for I/O
- Text advantages:
– human-readable (more or less) – can use any text-based tools – editors (emacs, vim, . . . , all the way down to toy editors) – – browsing tools (less, more, most, . . . ) – comparison tools (diff, diff3, xxdiff, . . . ) – text manipulation (sed, awk, tr, sort, uniq, . . . ) – portable (to non-EBCDIC computers, anyway)
- Text disadvantages:
– representation of numeric data (typically) uses more space – converting numeric data from ASCII to in-memory (binary) format – is time-consuming, and –
Jim Diamond, Jodrey School of Computer Science, Acadia University
195
Addressing of Multi-Byte Binary Data: “The Endian Wars”
- Suppose you have a 32-bit int on our computers
– – say, for example, locations 100, 101, 102, and 103
- Q: where does the least-significant byte (LSB) go?
- Two possibilities:
– most significant byte (MSB) in location 100, LSB in location 103 – this is known as big-endian — the big end comes first – LSB in location 100, MSB in location 103 – this is known as little-endian — the little end comes first
- Reference: Gullliver’s Travels (1726) by Jonathan Swift (1667-1745)
- Don’t even ask about PDP-endian!
Jim Diamond, Jodrey School of Computer Science, Acadia University
196
Big-Endian vs. Little-Endian So What?
- Suppose you write binary data on one computer
– suppose you later try to read it on another computer – if the endian-ness (also known as the byte sex) of the computers is different, you will read gibberish
- Intel i386 machines (and compatibles) are little-endian
- Many (most?) other computer architectures are big-endian
- The internet is considered to be big-endian
– i.e., the default assumed format for binary data sent across the internet is big-endian
- So: if you want your data to be portable, either
– – make it a Well Known Fact that the data format defines the data to be in little-endian format, or – have “meta-data” with the data file that specifies the endian-ness
Jim Diamond, Jodrey School of Computer Science, Acadia University
197
Binary I/O Using st❞✐♦
- There are two stdio functions to do binary I/O
–
fread() and fwrite()
- Note that on non-Unix-like systems, to do binary I/O you should
fopen() your file with “rb” or “wb” (it is redundant but ok to do this
- n Unix-like systems too)
- The calling sequence of fread() is as follows:
size_t fread(const void * ptr, size_t size, size_t nmemb, FILE * stream);
where –
ptr is a pointer to a block of memory large enough to hold the
desired data –
size is the number of bytes used by each data item to be read in
–
nmemb is the number of data items to read
–
stream is the file descriptor from which to read data
- fread() returns the number of items (NOT BYTES) successfully read
Jim Diamond, Jodrey School of Computer Science, Acadia University
198
Binary I/O Using st❞✐♦: ❢✇r✐t❡✭✮
- The calling sequence of fwrite() is as follows:
size_t fwrite(const void * ptr, size_t size, size_t nmemb, FILE * stream);
– writes size * nmemb bytes of data to stream from a block of memory pointed to by ptr
- fwrite() returns the number of items (NOT BYTES) successfully
written
- On x86-64 computers, using gcc, size_t is (indirectly) defined by
typedef long unsigned int size_t;
– you can discover this yourself by typing
gcc -E XYZ.c | grep ’typedef .* size_t’
where XYZ.c is any C program which includes stdio.h –
- r (maybe!) by laboriously going through files in /usr/include
- The sizeof() feature of C returns (something compatible with) a
size_t
Jim Diamond, Jodrey School of Computer Science, Acadia University
199
Binary I/O Using st❞✐♦: Examples
- Suppose you want to write n integers from an array arr:
ret = fwrite(arr, sizeof(int), n, outfile);
– after the call to fwrite(), ret will contain the number of ints successfully written
- Suppose you want to read some data into a structure st of type struct
my_struct:
ret = fread(&st, sizeof(struct my_struct), 1, infile);
- r maybe better
Why better?
ret = fread(&st, sizeof(st), 1, infile);
– after the call to fread(), ret will have the number of structs successfully read (0 or 1 in this case)
- Ugly-ism: to differentiate between EOF and error, you must use feof()
and/or ferror() – see the man pages for details
Jim Diamond, Jodrey School of Computer Science, Acadia University
200
Random Access to File Data
- Often, you read or write a file from beginning to end
- Sometimes, however, you want to “jump around” in the file
– e.g., if you have a file with 1,000,000 student records, you probably don’t want to read all 1,000,000 if you know the one you want is at the end
- To do random access to a file, you use “r+”, “w+” or “a+” in your
fopen() call (“a” means “append”)
- The fseek() call is used to “jump around”:
int fseek(FILE * stream, long offset, int whence);
where –
- ffset is how far (in bytes) you want to jump, and
–
whence is the starting location of your jump; it can be
– SEEK_SET (offset is from the start of the file), – SEEK_CUR (offset is from the current location), or – SEEK_END (offset is from the end of the file) – note that offset can be negative, to go backwards
Jim Diamond, Jodrey School of Computer Science, Acadia University
201
Random Access Example
- Suppose
–
student_db_fd is a FILE * which has been opened in r+ mode
– the data in the file is a sequence of struct student_rec –
s_r is a variable of type struct student_rec
– we want to read the 5000th record into s_r
- We can use the following code to accomplish this:
ret = fseek(student_db_fd, 4999 * sizeof(s_r), SEEK_SET); if (ret != 0) {
whine whine whine
} else { ret = fread(&s_r, sizeof(s_r), 1, student_db_fd); if (ret != 1) ... }
- Another example: prepare to read the very last record:
ret = fseek(student_db_fd, -1 * sizeof(s_r), SEEK_END);
Jim Diamond, Jodrey School of Computer Science, Acadia University
202
Related Random Access Functions
- Suppose you want to know where you are in the file right now:
long ftell(FILE * stream);
returns (on success!) the offset from the beginning of the file – so if you want to save a “bookmark” into the file for wherever you are now, you can do something like
next_one_to_read = ftell(student_db_fd);
- If you want to return to the beginning of the file, you can use
rewind(student_db_fd);
which is equivalent to
fseek(student_db_fd, 0L, SEEK_SET);
- GEQ: how can a C program use these functions to find out how long a
file is (without reading the whole thing)?
Jim Diamond, Jodrey School of Computer Science, Acadia University
203
I/O Without ❙t❞✐♦
- The stdio routines provide a high-level (?) interface implemented using
lower-level functions – i.e., stdio library functions use system calls to do the actual I/O – recall(?): in most OSes, a user program can not do I/O. . . instead, the program asks the OS to do the I/O for it
- stdio buffers input and output (when possible) to minimize the number
- f times the OS is called
– function calls are expensive, system calls much more so!
- Aside: stdout is buffered, but stderr is not
– use fflush(stdout) to flush stdout’s buffer
- stdio is standard C, thus portable to other systems
– low-level I/O is more system-dependent – we will look at Unix (and thus Linux, . . . ) I/O
Jim Diamond, Jodrey School of Computer Science, Acadia University
204
Low Level I/O Functions: 1
- pen() opens a file:
int open(const char * pathname, int flags); int open(const char * pathname, int flags, mode_t mode);
– e.g., fd = open("/etc/passwd", O_RDONLY) –
fd is the “file descriptor”: an int, not a FILE *
– the first arg is a const char * specifying the pathname – the second arg is a bitwise-or of a number of flags – the optional(!) third arg is the file mode (“permissions”) – returns a new file descriptor on success, −1 on failure
- close() closes a file:
int close(int fd);
– e.g., ret = close(fd); – returns zero on success, non-zero on failure
- When a program begins running, unless something is very, very b0rken,
fd 0 is stdin, fd 1 is stdout and fd 2 is stderr – check out fileno(stdin) or fileno(stdout)
Jim Diamond, Jodrey School of Computer Science, Acadia University
205
Low Level I/O Functions: 2
- ssize_t read(int fd, void * buf, size_t count);
–
ssize_t is an int on 32-bit Linux systems
long int on 64-bit systems
–
count is a number of bytes to (try to) read
–
buf is a pointer to a block of memory big enough to hold count
bytes of data – e.g., int i; ... ret = read(fd, &i, sizeof(i)); (attempts to) read an int (in binary) from file descriptor fd
- ssize_t write(int fd, const void * buf, size_t count);
– e.g., ret = write(1, "string\n", 7);
- utputs “string\n” to stdout†
- The pointer is a “generic” pointer: it can point to any data type
– so you can read/write chars, ints, floats, structs, . . . – no conversion between ASCII and binary is done
† Unless someone has been messing around with file descriptors
Jim Diamond, Jodrey School of Computer Science, Acadia University
206
Low Level I/O Functions: Sample Program
#include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <stdlib.h> #define BUF_SIZE 4096 int main() { char buf[BUF_SIZE]; int fd, ret, ret2; fd = open("/proc/cpuinfo", O_RDONLY); if (fd < 0) { whine whine whine AND EXIT } ret = read(fd, buf, BUF_SIZE); /* ret is # chars read */ if (ret < 0) { whine whine whine AND EXIT } ret2 = write(1, buf, ret); if (ret2 != ret) { whine whine whine AND EXIT } close(fd); // sort of redundant here, but good style return EXIT_SUCCESS; }
Jim Diamond, Jodrey School of Computer Science, Acadia University
207
Bit Operations: 1
- C provides operations to act on single bits:
–
x & y does a bitwise and of x and y
–
x | y does a bitwise or of x and y
–
x ^ y does a bitwise xor (exclusive or) of x and y
–
~y does a bitwise complement of y
- One use: packing multiple 1-bit “fields” into one int
– e.g., information returned from system calls
- Cryptography algorithms: moving pieces of data around
– real cryptography algorithms – “poor-man’s” cryptography:
a ^ (a ^ b) == b
– see memfrob() man page
- Data compression:
– some techniques substitute variable-length bit strings for (fixed-length) symbols
Jim Diamond, Jodrey School of Computer Science, Acadia University
208
Bit Operations: 2
- a << n shifts the bits of a left n places
– it “fills” with 0’s at right end
- a >> n shifts the bits of a right n places
– it fills with (a >> 31) & 1 at left end (if sizeof(a) is 4) –
(a >> 31) & 1 is the leftmost (most significant) bit of a
(if a is stored in 32 bits) – the most significant bit of a signed int is the “sign bit” – 0 if the int is ≥ 0, 1 if the int is < 0
- Iff there is no arithmetic overflow
–
a << n is like multiplying a by 2n, but faster
–
a >> n is like dividing a by 2n, but faster
– you may see this in some programs where people are concerned about writing fast code
Jim Diamond, Jodrey School of Computer Science, Acadia University
209
Fun With Bit Operations: 1
- Note: some of these depend on the machine using 2’s-complement
arithmetic
- Some of these don’t work (or don’t work as you might expect) if applied
to negative numbers
- The following macro uses an expression which rounds x up to the next
multiple of n, for n a power of 2:
#define ROUNDUP(x,n) (((x) + (n) - 1) & ~((n) - 1))
- This checks whether the nth bit of a number a is set:
if (a & (1 << n)) ...
- Set / clear the nth bit of a:
a |= (1 << n); ... a &= ~(1 << n);
- Toggle the nth bit of a:
a ^= (1 << n);
- Clear the rightmost 1 bit of a:
a &= a - 1;
Jim Diamond, Jodrey School of Computer Science, Acadia University
210
Fun With Bit Operations: 2
- Clear all bits except the rightmost 1 bit of a:
a &= -a;
- Set the rightmost 0 bit:
a ¯ a + 1;
- See if a and b have the same sign:
(a ^ b) >= 0
- Swap a and b without a temporary variable:
a ^= b; b ^= a; a ^= b;
- And many, many more;
http://graphics.stanford.edu/~seander/bithacks.html has lots
- Challenge 1: Write a C program to find the smallest of three integers,
without using any of the comparision operators.
- Challenge 2: Write a C function which does the addition of two integers
without using the ’+’ operator; you can use only the bit operators.
Jim Diamond, Jodrey School of Computer Science, Acadia University
211
Pointer Arithmetic
- There are certain arithmetic operations that make sense on pointers
– assume p1 and p2 are pointers
- Add an integer to a pointer: p1 + 2
– this is a pointer to the item (of type *p1) 2 units further down the block – you should not go past the end of the block!
- Difference of two pointers p1 and p2
– assumption: p1 and p2 point to the same block –
p2 - p1 is how many items after p1 where p2 is found p1 p2 p2 - p1 = 4
- Multiplying a pointer by a constant or a pointer makes no sense
- Dividing a pointer by a constant or a pointer makes no sense
- Adding two pointers together makes no sense
Jim Diamond, Jodrey School of Computer Science, Acadia University
212
Pointers to Pointers
- Given a pointer to a pointer to an int (say), two dereferences are
required to get the int: ipp −
→ ip − → i
- E.g., argv is a pointer to a block (“array”) of pointers
argv −
→ argv[0] − → progname
argv[1] −
→ arg1
argv[2] −
→ arg2
argv[3] −
→ arg3
- One common usage for a pointer to a pointer:
– suppose a “constructor” function should allocate a structure and return a pointer to the new structure – it could return the pointer via the return value, or – it could return the pointer via a parameter – for this to work, the parameter would need to be declared as a pointer to a pointer
Jim Diamond, Jodrey School of Computer Science, Acadia University
213
Generic Pointers: 1
- In some cases, you would like a pointer to be able to refer to any type
- f object
- For example, malloc() is declared to be
void * malloc(size_t size)
- A void pointer can be cast into any (more specific) pointer:
char * s = malloc(42); int * ip = malloc(sizeof(int));
– note that you don’t need to say
char * s = (char *)malloc(42);
although you may do so if you wish
- You can also say void * p; ... p = ip;
- This is vaguely similar to the Object object in Java:
Object o; String s = new String();
- = s;
// valid for any type of object s
Jim Diamond, Jodrey School of Computer Science, Acadia University
214
Generic Pointers: 2
- Not all of the usual pointer operations can be performed, or performed
meaningfully, on generic pointers:
int ia[] = {1, 2, 3, 4}; int * ip = ia; void * p; int j; char c; p = ip; j = *p; /* Not valid */ j = *(int *)p; /* Perfectly fine */ c = *(char *)p; /* Dirty trick to get first byte of *ip */ p++; /* gcc allows it, but only adds 1 to p */
- GEQ: On a little-endian computer, is c equal to 0 or 1?
Jim Diamond, Jodrey School of Computer Science, Acadia University
215
Generic Pointers: Sorting Anything
- Generic pointers can be used to allow functions to deal with pointers to
any type of “objects”
- Consider the library sorting function qsort()
void qsort(void * base, size_t nmemb, size_t size, int (*compar)(const void *, const void *));
–
base points to the beginning of an nmemb-element array of
“objects”, where the sizeof each “object” is size bytes –
compar() is a function which compares two “objects”
– see man qsort for the specifications
- With appropriate choice of compar(), qsort() can be used to sort any
type of “object” – the “objects” could be structs, ints, floats, strings, . . . – the comparison function must return an integer less than, equal to,
- r greater than zero if the first argument is considered to be
respectively less than, equal to, or greater than the second. – GEQ: does strcmp() do this?
Jim Diamond, Jodrey School of Computer Science, Acadia University
216
Generic Pointers: Sorting Points in 2-D Space: 1
- Suppose you represent points as a 2-D array of doubles,
and you wish to sort points from top to bottom, and left to right for points with equal Y-coordinates
int compare_2d_pts(const void * pt1, const void * pt2) { if (((double *)pt1)[1] > ((double *)pt2)[1]) return -1; if (((double *)pt1)[1] < ((double *)pt2)[1]) return 1; if (((double *)pt1)[0] < ((double *)pt2)[0]) return -1; if (((double *)pt1)[0] == ((double *)pt2)[0]) return 0; else return 1; }
- You need to cast the args to double * before dereferencing them
(in this case the array indexing is the dereferencing)
Jim Diamond, Jodrey School of Computer Science, Acadia University
217
Generic Pointers: Sorting Points in 2-D Space: 2
- Sample main() to test compare_2d_pts()
#include <stdio.h> #include <stdlib.h> int main() { double pts[][2] = {{2, 3}, {1, 4}, {3, 2}, {4, 1}, {2, 1}, {0, 0}, {-1, -1}, {-1, 2}, {3, -2}, {6, 6}}; int n_pts = sizeof(pts) / sizeof(pts[0]); for (int i = 0; i < n_pts; i++) printf("[%.1f, %.1f]\n", pts[i][0], pts[i][1]); printf("\n"); qsort(pts, n_pts, 2 * sizeof(double), compare_2d_pts); for (int i = 0; i < n_pts; i++) printf("[%.1f, %.1f]\n", pts[i][0], pts[i][1]); return EXIT_SUCCESS; }
Jim Diamond, Jodrey School of Computer Science, Acadia University
218
Generic Pointers: Accessing the n-th Element of a Block
- Consider qsort()
void qsort(void * base, size_t nmemb, size_t size, int (*compar)(const void *, const void *));
- Q: how does qsort() access a given element?
- A: it knows
– the address of the start of the block – the size of each “object” in the block – how to add and multiply
- To get the n-th element,
– compute the address:
(char *)base + n * size
– this is a pointer to the n-th “object” in the block starting at base
Jim Diamond, Jodrey School of Computer Science, Acadia University
219
Portability The Hard Way: 1
- Careful coding will allow most C programs to compile and run on a wide
variety of systems – but dealing with differences between different operating systems and/or hardware platforms might require extra work
- Occasionally (but rarely) some #include files are named inconsistently
– e.g., string.h is (was for a while?) strings.h in some OSes
- More frequent (but getting better?): system calls (which are different
than standard C library functions) may need different #include files
- Solution (it is often nice to hide these in your own .h file):
#ifdef __gnu_linux__ #include <name-under-linux.h> #define <some appropriate #define for Linux> #elif __FreeBSD__ #include <name-under-FreeBSD.h> #define <some appropriate #define for FreeBSD> #else ... #endif
Jim Diamond, Jodrey School of Computer Science, Acadia University
220
Portability The Hard Way: 2
- Endian-problems: careful programming can often solve this
– library functions convert from local machine byte order to network byte order (and vice versa) –
uint32_t htonl(uint32_t hostlong);
– the htonl() function converts the unsigned 32-bit integer
hostlong from host byte order to network byte order
– see also htons(), ntohl(), ntohs() – don’t forget to use these when doing “network programming”!
- When doing binary I/O this endian-ness must be considered
– you can use htonl() and friends to convert – must also consider that long may be different than int – e.g., 64-bit processors
- Calling a 32-bit quantity a “long” is archaic
– but not quite as bad as M$ calling a 32-bit number a DWORD (“double word”)
Jim Diamond, Jodrey School of Computer Science, Acadia University
221
Portability The Hard Way: 3
- Problem: data types whose size might change
– e.g., a long might be 32 bits or 64 bits on your laptop, depending
- n whether you are running in 32-bit mode or 64-bit mode
– thus (for example) a int64_t might be a long long or just a long – but: printf() uses %ld for a long, but %lld for a long long
- Ugly(?) solution:
–
#include <inttypes.h>
– to print a uint64_t variable ul64, use a statement like
printf("... %" PRIu64 " ...", ..., ul64, ...);
– recall: the C preprocessor automagically “glues together” adjacent string constants
- Aside from PRInting macros, there are SCaNning macros (e.g., SCNu16)
- Aside from 64-bit values, there are macros for 8-, 16- and 32-bit values
- Aside from “u” (unsigned), there are macros for “d”, “i”, “o”, “x” and
(for printing only) “X”
Jim Diamond, Jodrey School of Computer Science, Acadia University
222
Deep and Shallow Copying of Structures: 1
- Suppose you have
struct v struct x { { int val; int age; struct x * x_var; char name[101]; } v1, v2; } x1; x1.age = 121; strncpy(x1.name, "John. Q. Woodcutter", 100); v1.val = 42; v1.x_var = &x1;
- At this point, v1 and x1 are both completely “set up”.
printf("%d:%d", v1.val, v1.x_var->age); would output “42:121”
- If I say
v2 = v1; v1.val = 99; v1.x_var->age = 0;
then –
printf("%d:%d", v1.val, v1.x_var->age); would output “99:0”
–
printf("%d:%d", v2.val, v2.x_var->age); would output what?
Jim Diamond, Jodrey School of Computer Science, Acadia University
223
Deep and Shallow Copying of Structures: 2
- Given the code from the previous slide,
printf("%d:%d", v2.val, v2.x_var->age);
would output “42:0”
- v2.val retains the value it received in the assignment “v2 = v1;”
- Although v2.x_var->age was not explicitly changed, it is nonetheless
changed
- Reason:
–
v2.x_var points to the same item as v1.x_var
–
v1.x_var was changed with the statement v1.x_var->age = 0;
- Assignment of structs in C produces a so-called “shallow copy”
– the struct data is copied, but copies of “pointees” are not made
- Duplicating the items pointed to, and things they point to, and so on
and so on, is known as a “deep copy”
- If you need a deep copy, in C you have to code all of it yourself
– how about Java?
Jim Diamond, Jodrey School of Computer Science, Acadia University
224
Unions: Motivation
- Consider the array of structs
struct expression_tree_node { int type; int int_value; float float_value; double double_value; char operation; } nodes[10000];
- Suppose you never need to store more than one of int_value,
float_value, double_value or operation at a given time
–
type is used to indicate which one is currently being stored
- Suppose you want to minimize memory usage
– the above struct uses (in total) 24 (or 32) bytes (32-bit/64-bit) – 24?? 32?? Eh?? GEQ: why is that? – but at most 8 data bytes + 4 bytes for type are needed
- Unions provide a facility to avoid wasting this space
Jim Diamond, Jodrey School of Computer Science, Acadia University
225
Unions: Example
- Replace the previous definition with
struct expression_tree_node { int type; union { int int_value; float float_value; double double_value; char operation; } value; } nodes[10000];
- The union only occupies as much space as its largest item (8 bytes in
this case)
- Access syntax: nodes[123].value.int_value;
- Dirty trick: you can look at the same bit pattern as a float, int, . . .
by storing a value into one union field and then accessing it via another field
Jim Diamond, Jodrey School of Computer Science, Acadia University
226
Calling Other Programs from C
- Here are four ways to do this:
(1) int status = system("shell command");
int exit_code = WEXITSTATUS(status); // if status != -1
– call sh to run “shell command” and get the exit status (which could either be the exit code or “signal” information) (2) FILE * infile = popen("shell command", "r"); – call sh to run “shell command” and read any stdout output from that command via infile (3) FILE * outfile = popen("shell command", "w"); – call sh to run “shell command” and send data to that command’s stdin via outfile (4a) fork() + exec*() – create a whole new process on your system (4b) pipe() + fork() + exec*() – create a whole new process on your system that your current process can “talk” to.
Good COMP 3713 assignment question.
Jim Diamond, Jodrey School of Computer Science, Acadia University
227
Unix: Privileged and Unprivileged Users
- Users in Unix-type systems have a numeric user id (UID) and a numeric
group id (GID) associated with each userid – it is these numbers that determine permissions for accessing programs – use the id program (or peruse /etc/passwd and /etc/group) to see your userid and which groups you belong to – use ls -n to see the UID and GID of files
- In Unix-type systems, root (UID 0) is a privileged user
–
root can read/write any files on the system†, regardless of the
permission bits
- An “ordinary” users can only execute programs, read files, or write files
according to the file and directory permission bits and their UID and GID
- Q: how can an unprivileged user do privileged things?
Jim Diamond, Jodrey School of Computer Science, Acadia University
228
Executing A Program with Other Permissions: 1
- In order to do privileged operations (such as system administration
tasks) an ordinary user must gain “root permissions”
- This is gone with the so-called setuid and setgid bits:
$ ls -l /usr/bin/sudo
- rws--x--x 1 root root 117840 Feb 10
2015 /usr/bin/sudo*
- Note the “rws”: the “s” means that when this program is executed, it
takes on the permissions associated with the owner of the file, not the user executing it – so the sudo program runs with root permissions!
- Similarly, an “s” in the group permissions’ execute bit means the
program takes on the permissions associated with that group:
- rwsr-sr-x 1 daemon daemon 50456 Jul 28
2010 /usr/bin/at
Jim Diamond, Jodrey School of Computer Science, Acadia University
229
Executing A Program with Other Permissions: 2
- You can give your own programs setuid/setgid permissions so that other
users executing your program get your permissions:
$ chmod 4755 myprog # myprog will get setuid bit set
- Giving a program setuid or setgid perms is a potential security risk and
must be done with great care – subtle bugs in your program could allow attackers to – delete files on your system – crash your system – run their own programs on your system –
. . .
Jim Diamond, Jodrey School of Computer Science, Acadia University
230
Test Your Knowledge of C
- “C Puzzles” http://www.gowrikumar.com/c/index.php
– some explanations:
http://codeitdown.com/c-puzzles-answered/
- “C Puzzles”
https://chortle.ccsu.edu/CPuzzles/CPuzzlesMain.html
- Most C++ puzzles, but a few are in C:
http://www.geeksforgeeks.org/category/c-puzzles/
- Interview C Puzzles: https://vasanthexperiments.wordpress.com/
2011/08/31/interview-c-puzzles/
- Lots more: do a web search on “C puzzles”
- Some places to test your programming skills:
http://www.programming-challenges.com https://uva.onlinejudge.org/ https://leetcode.com/ https://www.coderbyte.com/ https://www.codewars.com/
Jim Diamond, Jodrey School of Computer Science, Acadia University