Week 13 - Monday What did we talk about last time? Bit fields - - PowerPoint PPT Presentation
Week 13 - Monday What did we talk about last time? Bit fields - - PowerPoint PPT Presentation
Week 13 - Monday What did we talk about last time? Bit fields Unions Programs must be written for people to read and only incidentally for machines to execute. Harold Abelson and Gerald Jay Sussman Authors of The Structure and
SLIDE 1
SLIDE 2
What did we talk about last time? Bit fields Unions
SLIDE 3
SLIDE 4
SLIDE 5
Programs must be written for people to read and only incidentally for machines to execute. Harold Abelson and Gerald Jay Sussman
Authors of The Structure and Interpretation of Computer Programs
SLIDE 6
SLIDE 7
You just learned how to read and write files
- Why are we going to do it again?
There is a set of Unix/Linux system commands that do the same
thing
Most of the higher level calls (fopen(), fprintf(), fgetc(),
and even trusty printf()) are built on top of these low level I/O commands
These give you direct access to the file system (including pipes) They are often more efficient You'll use the low-level file style for networking All low level I/O is binary
SLIDE 8
To use low level I/O functions, include headers as follows:
#include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h>
You won't need all of these for every program, but you might
as well throw them all in
SLIDE 9
High level file I/O uses a FILE* variable for referring to a file Low level I/O uses an int value called a file descriptor These are small, nonnegative integers Each process has its own set of file descriptors Even the standard I/O streams have descriptors
Stream Descriptor Defined Constant stdin STDIN_FILENO stdout 1 STDOUT_FILENO stderr 2 STDERR_FILENO
SLIDE 10
To open a file for reading or writing, use the open() function
- There used to be a creat() function that was used to create new
files, but it's now obsolete
The open() function takes the file name, an int for mode,
and an (optional) int for permissions
It returns a file descriptor
int fd = open("input.dat", O_RDONLY);
SLIDE 11
The main modes are
- O_RDONLY
Open the file for reading only
- O_WRONLY
Open the file for writing only
- O_RDWR
Open the file for both
There are many other optional flags that can be combined with the main modes A few are
- O_CREAT
Create file if it doesn’t already exist
- O_DIRECTORY
Fail if pathname is not a directory
- O_TRUNC
Truncate existing file to zero length
- O_APPEND
Writes are always to the end of the file
These flags can be combined with the main modes (and each other) using bitwise OR
int fd = open("output.dat", O_WRONLY | O_CREAT | O_APPEND );
SLIDE 12
Because this is Linux, we can also specify the permissions for a file we create The last value passed to open() can be any of the following permission flags bitwise
ORed together
- S_IRUSR
User read
- S_IWUSR
User write
- S_IXUSR
User execute
- S_IRGRP
Group read
- S_IWGRP
Group write
- S_IXGRP
Group execute
- S_IROTH
Other read
- S_IWOTH
Other write
- S_IXOTH
Other execute
int fd = open("output.dat", O_WRONLY | O_CREAT | O_APPEND, S_IRUSR | S_IRGRP );
SLIDE 13
Opening the file is actually the hardest part Reading is straightforward with the read() function Its arguments are
- The file descriptor
- A pointer to the memory to read into
- The number of bytes to read
Its return value is the number of bytes successfully read
int fd = open("input.dat", O_RDONLY); int buffer[100]; // Fill with something read( fd, buffer, sizeof(int)*100 );
SLIDE 14
Writing to a file is almost the same as reading Arguments to the write() function are
- The file descriptor
- A pointer to the memory to write from
- The number of bytes to write
Its return value is the number of bytes successfully written
int fd = open("output.dat", O_WRONLY); int buffer[100]; int i = 0; for( i = 0; i < 100; i++ ) buffer[i] = i + 1; write( fd, buffer, sizeof(int)*100 );
SLIDE 15
To close a file descriptor, call the close() function Like always, it's a good idea to close files when you're done
with them
int fd = open("output.dat", O_WRONLY); // Write some stuff close( fd );
SLIDE 16
It's possible to seek with low level I/O using the lseek()
function
Its arguments are
- The file descriptor
- The offset
- Location to seek from: SEEK_SET, SEEK_CUR, or SEEK_END
int fd = open("input.dat", O_RDONLY); lseek( fd, 100, SEEK_SET );
SLIDE 17
Use low level I/O to write a hex dump program Print out the bytes in a program, 16 at a time, in hex, along
with the current offset in the file, also in hex
Sample output:
0x000000 : 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 0x000010 : 02 00 03 00 01 00 00 00 c0 83 04 08 34 00 00 00 0x000020 : e8 23 00 00 00 00 00 00 34 00 20 00 06 00 28 00 0x000030 : 1d 00 1a 00 06 00 00 00 34 00 00 00 34 80 04 08
SLIDE 18
A file descriptor is not necessarily unique
- Not even in the same process
It's possible to duplicate file descriptors
- Thus, the output to one file descriptor also goes to the other
- Input is similar
SLIDE 19
stderr usually prints to the screen, even if stdout is being
redirected to a file
What if you want stderr to get printed to that file as well? You can also redirect only stderr to a file
./program > output.txt ./program > output.txt 2>&1 ./program 2> errors.log
SLIDE 20
If you want a new file descriptor number that refers to an open file
descriptor, you can use the dup() function
It's often useful to change an existing file descriptor to refer to
another stream, which you can do with dup2()
Now all writes to stderr will go to stdout
int fd = dup(1); // Makes a copy of stdout dup2(1, 2); // Makes 2 (stderr) a copy of 1 (stdout)
SLIDE 21
Reading from and writing to files on a hard drive is expensive These operations are buffered so that one big read or write
happens instead of lots of little ones
- If another program is reading from a file you've written to, it reads
from the buffer, not the old file
Even so, it is more efficient for your code to write larger
amounts of data in one pass
- Each system call has overhead
SLIDE 22
To avoid having too many system calls, stdio uses this
second kind of buffering
- This is an advantage of stdio functions rather than using low-level
read() and write() directly
The default buffer size is 8192 bytes The setvbuf(), setbuf(), and setbuffer() functions
let you specify your own buffer
SLIDE 23
Stdio output buffers are generally flushed (sent to the system)
when they hit a newline ('\n') or get full
- When debugging code that can crash, make sure you put a newline in
your printf(), otherwise you might not see the output before the crash
There is an fflush() function that can flush stdio buffers
fflush(stdout); // Flushes stdout // Could be any FILE* fflush(NULL); // Flushes all buffers
SLIDE 24
SLIDE 25
You can build layers of I/O on top of other layers
- printf() is built on top of low level write() call
The standard networking model is called the Open Systems
Interconnection Reference Model
- Also called the OSI model
- Or the 7 layer model
SLIDE 26
There are many different
communication protocols
The OSI reference model is an
idealized model of how different parts of communication can be abstracted into 7 layers
Imagine that each layer is
talking to another parallel layer called a peer on another computer
Only the physical layer is a real
connection between the two
Application
Presentation Session Transport Network Data Link Physical
SLIDE 27
Not every layer is always used Sometimes user errors are referred to as Layer 8 problems
Layer Name Mnemonic Activity Example 7 Application Away User-level data HTTP 6 Presentation Pretzels Data appearance, some encryption Unicode 5 Session Salty Sessions, sequencing, recovery TLS 4 Transport Throw Flow control, end-to-end error detection TCP 3 Network Not Routing, blocking into packets IP 2 Data Link Dare Data delivery, packets into frames, transmission error recovery Ethernet 1 Physical Programmers Physical communication, bit transmission Electrons in copper
SLIDE 28
There is where the rubber meets the road The actual protocols for exchanging bits as electronic signals
happen at the physical layer
At this level are things like RJ45 jacks and rules for
interpreting voltages sent over copper
- Or light pulses over fiber
SLIDE 29
Ethernet is the most widely used example of the data layer Machines at this layer are identified by a 48-bit Media Access
Control (MAC) address
The Address Resolution Protocol (ARP) can be used for one
machine to ask another for its MAC address
- Try the arptables command in Linux
Some routers allow a MAC address to be spoofed, but MAC
addresses are intended to be unique and unchanging for a particular piece of hardware
SLIDE 30
The most common network layer protocol is Internet Protocol
(IP)
Each computer connected to the Internet should have a
unique IP address
- IPv4 is 32 bits written as four numbers from 0 – 255, separated by
dots
- IPv6 is 128 bits written as 8 groups of 4 hexadecimal digits
We can use traceroute to see the path of hosts leading to
some IP address
SLIDE 31
There are two popular possibilities for the transport layer Transmission Control Protocol (TCP) provides reliability
- Sequence numbers for out of order packets
- Retransmission for packets that never arrive
User Datagram Protocol (UDP) is simpler
- Packets can arrive out of order or never show up
- Many online games use UDP because speed is more important
SLIDE 32
This layer isn't a key part of the TCP/IP model The secure sessions provided by TLS can be considered the
session layer
SLIDE 33
The presentation layer is often optional It specifies how the data should appear This layer is responsible for character encoding (ASCII, UTF-8,
etc.)
MIME types are sometimes considered presentation layer
issues
Encryption and decryption can happen here
SLIDE 34
This is where the data is interpreted and used HTTP is an example of an application layer protocol A web browser takes the information delivered via HTTP and
renders it
Code you write deals a great deal with the application layer
SLIDE 35
The goal of the OSI model is to make lower layers transparent to upper ones
Application Presentation Session Transport Network Data Link Physical Application Presentation Session Transport Network Data Link Physical MAC IP UDP Payload IP UDP Payload UDP Payload Payload Payload Payload
SLIDE 36
Seven layers is a lot to remember Mnemonics have been developed to help
Application All All A Away Presentation Pros People Powered-Down Pretzels Session Search Seem System Salty Transport Top To Transmits Throw Network Notch Need No Not Data Link Donut Data Data Dare Physical Places Processing Packets Programmers
SLIDE 37
The OSI model is sort of a sham
- It was invented after the Internet was already in use
- You don't need all layers
- Some people think this categorization is not useful
Most network communication uses TCP/IP We can view TCP/IP as four layers: Layer Action Responsibilities Protocol Application Prepare messages User interaction HTTP, FTP, etc. Transport Convert messages to packets Sequencing, reliability, error correction TCP or UDP Internet Convert packets to datagrams Flow control, routing IP Physical Transmit datagrams as bits Data communication
SLIDE 38
A TCP/IP connection between two hosts (computers) is
defined by four things
- Source IP
- Source port
- Destination IP
- Destination port
One machine can be connected to many other machines, but
the port numbers keep it straight
SLIDE 39
Certain kinds of network communication are usually done on
specific ports
- 20 and 21:
File Transfer Protocol (FTP)
- 22:
Secure Shell (SSH)
- 23:
Telnet
- 25:
Simple Mail Transfer Protocol (SMTP)
- 53:
Domain Name System (DNS) service
- 80:
Hypertext Transfer Protocol (HTTP)
- 110:
Post Office Protocol (POP3)
- 443:
HTTP Secure (HTTPS)
SLIDE 40
Computers on the Internet have addresses, not names Google.com is actually [74.125.67.100] Google.com is called a domain The Domain Name System or DNS turns the name into an
address
SLIDE 41
Old-style IP addresses are in this form:
- 74.125.67.100
4 numbers between 0 and 255, separated by dots That’s a total of 2564 = 4,294,967,296 addresses But there are 7 billion people on earth…
SLIDE 42
IPv6 are the new IP addresses that are beginning to be used
by modern hardware
- 8 groups of 4 hexadecimal digits each
- 2001:0db8:85a3:0000:0000:8a2e:0370:7334
- 1 hexadecimal digit has 16 possibilities
- How many different addresses is this?
- 1632 = 2128 ≈ 3.4×1038 is enough to have 500 trillion addresses for
every cell of every person’s body on Earth
- Will it be enough?!
SLIDE 43
SLIDE 44
More on networking Sockets
SLIDE 45