CS 230 - Spring 2020 1-1
Lecture 6 Endianness and Characters CS 230 - Spring 2020 1-1 Byte - - PowerPoint PPT Presentation
Lecture 6 Endianness and Characters CS 230 - Spring 2020 1-1 Byte - - PowerPoint PPT Presentation
CS 230 Introduction to Computers and Computer Systems Lecture 6 Endianness and Characters CS 230 - Spring 2020 1-1 Byte Convention: 8 bits = 1 byte 8 is a power of 2 twos complement range -128 ... 127 unsigned binary
CS 230 - Spring 2020 1-2
Byte
Convention: 8 bits = 1 byte
8 is a power of 2 two’s complement range -128 ... 127 unsigned binary range 0 … 255 two hexadecimal digits historical
useful range to represent characters, control 8-bit circuit width
CS 230 - Spring 2020 1-3
Word
Increasing circuit width: 32 or 64 bits Individual bytes are still accessible
what order are they in? Gulliver’s Travels – Jonathan Swift, 1726
Little-endian: least-significant byte first
same number in memory, regardless of length can start math right away
Big-endian: most-significant byte first
“natural” way of writing numbers
CS 230 - Spring 2020 1-4
Byte Order
Especially relevant for distributed systems
Different computers, different endianness?
In principle, similar challenge for bits
not relevant, since bits are usually not addressable
we’ll talk about being “addressable” later
in CS 230, bits are written in big-endian
the same order we’ve been doing so far
CS 230 - Spring 2020 1-5
Endianness Example
Consider the big-endian 32-bit word 0x01FAB352
In what order do we send the bits to a little-endian
computer?
CS 230 - Spring 2020 1-6
Endianness Example
Consider the big-endian 32-bit word 0x01FAB352
In what order do we send the bits to a little-endian
computer?
Break it up into bytes: 32 / 8 = 4 bytes
0x01 0xFA 0xB3 0x52
CS 230 - Spring 2020 1-7
Endianness Example
Consider the big-endian 32-bit word 0x01FAB352
In what order do we send the bits to a little-endian
computer?
Break it up into bytes: 32 / 8 = 4 bytes
0x01 0xFA 0xB3 0x52
Swap them to little-endian
0x52 0xB3 0xFA 0x01
CS 230 - Spring 2020 1-8
Endianness Example
Consider the big-endian 32-bit word 0x01FAB352
In what order do we send the bits to a little-endian
computer?
Break it up into bytes: 32 / 8 = 4 bytes
0x01 0xFA 0xB3 0x52
Swap them to little-endian
0x52 0xB3 0xFA 0x01
Convert them to binary
First - 01010010 10110011 11111010 00000001 - Last
CS 230 - Spring 2020 1-9
Endianness Try it Yourself
Consider the big-endian 32-bit word 42310
In what order do we send the bits to a little-endian
computer?
CS 230 - Spring 2020 1-10
Endianness Try it Yourself
Consider the big-endian 32-bit word 42310
In what order do we send the bits to a little-endian
computer?
Convert it to binary (or you could do hexadecimal here)
00000000000000000000000110100111
CS 230 - Spring 2020 1-11
Endianness Try it Yourself
Consider the big-endian 32-bit word 42310
In what order do we send the bits to a little-endian
computer?
Convert it to binary (or you could do hexadecimal here)
00000000000000000000000110100111
Break it up into bytes: 32 / 8 = 4 bytes
00000000 00000000 00000001 10100111
CS 230 - Spring 2020 1-12
Endianness Try it Yourself
Consider the big-endian 32-bit word 42310
In what order do we send the bits to a little-endian
computer?
Convert it to binary (or you could do hexadecimal here)
00000000000000000000000110100111
Break it up into bytes: 32 / 8 = 4 bytes
00000000 00000000 00000001 10100111
Swap them to little-endian (and convert to binary if in hex)
First - 10100111 00000001 00000000 00000000 - Last
CS 230 - Spring 2020 1-13
Characters
What about representing text with bits?
characters: a, b, 8, *, \, Q, etc.
Assign each character a number
but who decides which character gets which number? some languages have many characters
CS 230 - Spring 2020 1-14
ASCII Characters
ASCII – American Standard Code for
Information Interchange
1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS
HT LF VT FF CR SO SI
10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS
GS RS US
20 ! “ # $ % & ' ( ) * + ,
- .
/ 30 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n
- 70
p q r s t u v w x y z { | } ~
DEL
CS 230 - Spring 2020 1-15
ASCII Example
Example: interpret 0x4672656506 as ASCII
1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS
HT LF VT FF CR SO SI
10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS
GS RS US
20 ! “ # $ % & ' ( ) * + ,
- .
/ 30 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n
- 70
p q r s t u v w x y z { | } ~
DEL
CS 230 - Spring 2020 1-16
ASCII Example
Example: interpret 0x4672656506 as ASCII
Answer: Free[ACK]
1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS
HT LF VT FF CR SO SI
10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS
GS RS US
20 ! “ # $ % & ' ( ) * + ,
- .
/ 30 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n
- 70
p q r s t u v w x y z { | } ~
DEL
CS 230 - Spring 2020 1-17
ASCII Try it Yourself
Try it yourself: interpret 0x0077696E as ASCII
1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS
HT LF VT FF CR SO SI
10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS
GS RS US
20 ! “ # $ % & ' ( ) * + ,
- .
/ 30 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n
- 70
p q r s t u v w x y z { | } ~
DEL
CS 230 - Spring 2020 1-18
ASCII Try it Yourself
Try it yourself: interpret 0x0077696E as ASCII
Answer: [NUL]win
1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS
HT LF VT FF CR SO SI
10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS
GS RS US
20 ! “ # $ % & ' ( ) * + ,
- .
/ 30 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n
- 70
p q r s t u v w x y z { | } ~
DEL
CS 230 - Spring 2020 1-19 Unicode provides over 100,000 code points
for characters, symbols, etc. code point range: U+0000 .... U+10FFFF range: 216 + 220 ~ 1 million possible code points
UTF: Unicode Transformation Format UTF-32
direct 4-byte encoding of code points
Unicode
CS 230 - Spring 2020 1-20
Variable Length Encoding
UTF-16
2-byte encoding for most codes sometimes special prefix indicates 4-byte code
UTF-8
similar principle: variable length encoding 1-4 bytes 1-byte encoding compatible to ASCII
CS 230 - Spring 2020 1-21
Data Representation
Interpretation is in the eye of the beholder What does this represent?
01110111011010000111100100111111
CS 230 - Spring 2020 1-22
Data Representation
01110111011010000111100100111111 01110111 01101000 01111001 00111111 7 7 6 8 7 9 3 F 77 68 79 3F w h y ?
Or: 2,003,335,48710
CS 230 - Spring 2020 1-23
Big Integers
Word size currently 32 or 64 bits Programming libraries offer big integer types Complex data structures – more costly
operations in software, rather than hardware
CS 230 - Spring 2020 1-24
Data Interpretation
Bits have no inherent meaning
interpretation is in the eye of the beholder must start from implicit agreement
ASCII, UTF, Floating Point, Two’s Complement, etc.