Lecture 6 Endianness and Characters CS 230 - Spring 2020 1-1 Byte - - PowerPoint PPT Presentation

lecture 6 endianness and characters
SMART_READER_LITE
LIVE PREVIEW

Lecture 6 Endianness and Characters CS 230 - Spring 2020 1-1 Byte - - PowerPoint PPT Presentation

CS 230 Introduction to Computers and Computer Systems Lecture 6 Endianness and Characters CS 230 - Spring 2020 1-1 Byte Convention: 8 bits = 1 byte 8 is a power of 2 twos complement range -128 ... 127 unsigned binary


slide-1
SLIDE 1

CS 230 - Spring 2020 1-1

CS 230 – Introduction to Computers and Computer Systems Lecture 6 – Endianness and Characters

slide-2
SLIDE 2

CS 230 - Spring 2020 1-2

Byte

 Convention: 8 bits = 1 byte

 8 is a power of 2  two’s complement range -128 ... 127  unsigned binary range 0 … 255  two hexadecimal digits  historical

 useful range to represent characters, control  8-bit circuit width

slide-3
SLIDE 3

CS 230 - Spring 2020 1-3

Word

 Increasing circuit width: 32 or 64 bits  Individual bytes are still accessible

 what order are they in?  Gulliver’s Travels – Jonathan Swift, 1726

 Little-endian: least-significant byte first

 same number in memory, regardless of length  can start math right away

 Big-endian: most-significant byte first

 “natural” way of writing numbers

slide-4
SLIDE 4

CS 230 - Spring 2020 1-4

Byte Order

 Especially relevant for distributed systems

 Different computers, different endianness?

 In principle, similar challenge for bits

 not relevant, since bits are usually not addressable

 we’ll talk about being “addressable” later

 in CS 230, bits are written in big-endian

 the same order we’ve been doing so far

slide-5
SLIDE 5

CS 230 - Spring 2020 1-5

Endianness Example

 Consider the big-endian 32-bit word 0x01FAB352

 In what order do we send the bits to a little-endian

computer?

slide-6
SLIDE 6

CS 230 - Spring 2020 1-6

Endianness Example

 Consider the big-endian 32-bit word 0x01FAB352

 In what order do we send the bits to a little-endian

computer?

 Break it up into bytes: 32 / 8 = 4 bytes

 0x01 0xFA 0xB3 0x52

slide-7
SLIDE 7

CS 230 - Spring 2020 1-7

Endianness Example

 Consider the big-endian 32-bit word 0x01FAB352

 In what order do we send the bits to a little-endian

computer?

 Break it up into bytes: 32 / 8 = 4 bytes

 0x01 0xFA 0xB3 0x52

 Swap them to little-endian

 0x52 0xB3 0xFA 0x01

slide-8
SLIDE 8

CS 230 - Spring 2020 1-8

Endianness Example

 Consider the big-endian 32-bit word 0x01FAB352

 In what order do we send the bits to a little-endian

computer?

 Break it up into bytes: 32 / 8 = 4 bytes

 0x01 0xFA 0xB3 0x52

 Swap them to little-endian

 0x52 0xB3 0xFA 0x01

 Convert them to binary

 First - 01010010 10110011 11111010 00000001 - Last

slide-9
SLIDE 9

CS 230 - Spring 2020 1-9

Endianness Try it Yourself

 Consider the big-endian 32-bit word 42310

 In what order do we send the bits to a little-endian

computer?

slide-10
SLIDE 10

CS 230 - Spring 2020 1-10

Endianness Try it Yourself

 Consider the big-endian 32-bit word 42310

 In what order do we send the bits to a little-endian

computer?

 Convert it to binary (or you could do hexadecimal here)

 00000000000000000000000110100111

slide-11
SLIDE 11

CS 230 - Spring 2020 1-11

Endianness Try it Yourself

 Consider the big-endian 32-bit word 42310

 In what order do we send the bits to a little-endian

computer?

 Convert it to binary (or you could do hexadecimal here)

 00000000000000000000000110100111

 Break it up into bytes: 32 / 8 = 4 bytes

 00000000 00000000 00000001 10100111

slide-12
SLIDE 12

CS 230 - Spring 2020 1-12

Endianness Try it Yourself

 Consider the big-endian 32-bit word 42310

 In what order do we send the bits to a little-endian

computer?

 Convert it to binary (or you could do hexadecimal here)

 00000000000000000000000110100111

 Break it up into bytes: 32 / 8 = 4 bytes

 00000000 00000000 00000001 10100111

 Swap them to little-endian (and convert to binary if in hex)

 First - 10100111 00000001 00000000 00000000 - Last

slide-13
SLIDE 13

CS 230 - Spring 2020 1-13

Characters

 What about representing text with bits?

 characters: a, b, 8, *, \, Q, etc.

 Assign each character a number

 but who decides which character gets which number?  some languages have many characters

slide-14
SLIDE 14

CS 230 - Spring 2020 1-14

ASCII Characters

 ASCII – American Standard Code for

Information Interchange

1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS

HT LF VT FF CR SO SI

10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS

GS RS US

20 ! “ # $ % & ' ( ) * + ,

  • .

/ 30 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n

  • 70

p q r s t u v w x y z { | } ~

DEL

slide-15
SLIDE 15

CS 230 - Spring 2020 1-15

ASCII Example

 Example: interpret 0x4672656506 as ASCII

1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS

HT LF VT FF CR SO SI

10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS

GS RS US

20 ! “ # $ % & ' ( ) * + ,

  • .

/ 30 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n

  • 70

p q r s t u v w x y z { | } ~

DEL

slide-16
SLIDE 16

CS 230 - Spring 2020 1-16

ASCII Example

 Example: interpret 0x4672656506 as ASCII

 Answer: Free[ACK]

1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS

HT LF VT FF CR SO SI

10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS

GS RS US

20 ! “ # $ % & ' ( ) * + ,

  • .

/ 30 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n

  • 70

p q r s t u v w x y z { | } ~

DEL

slide-17
SLIDE 17

CS 230 - Spring 2020 1-17

ASCII Try it Yourself

 Try it yourself: interpret 0x0077696E as ASCII

1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS

HT LF VT FF CR SO SI

10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS

GS RS US

20 ! “ # $ % & ' ( ) * + ,

  • .

/ 30 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n

  • 70

p q r s t u v w x y z { | } ~

DEL

slide-18
SLIDE 18

CS 230 - Spring 2020 1-18

ASCII Try it Yourself

 Try it yourself: interpret 0x0077696E as ASCII

 Answer: [NUL]win

1 2 3 4 5 6 7 8 9 A B C D E F 00 NUL SOH STX ETX EOT ENQ ACK BEL BS

HT LF VT FF CR SO SI

10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS

GS RS US

20 ! “ # $ % & ' ( ) * + ,

  • .

/ 30 1 2 3 4 5 6 7 8 9 : ; < = > ? 40 @ A B C D E F G H I J K L M N O 50 P Q R S T U V W X Y Z [ \ ] ^ _ 60 ` a b c d e f g h i j k l m n

  • 70

p q r s t u v w x y z { | } ~

DEL

slide-19
SLIDE 19

CS 230 - Spring 2020 1-19  Unicode provides over 100,000 code points

 for characters, symbols, etc.  code point range: U+0000 .... U+10FFFF  range: 216 + 220 ~ 1 million possible code points

 UTF: Unicode Transformation Format  UTF-32

 direct 4-byte encoding of code points

Unicode

slide-20
SLIDE 20

CS 230 - Spring 2020 1-20

Variable Length Encoding

 UTF-16

 2-byte encoding for most codes  sometimes special prefix indicates 4-byte code

 UTF-8

 similar principle: variable length encoding  1-4 bytes  1-byte encoding compatible to ASCII

slide-21
SLIDE 21

CS 230 - Spring 2020 1-21

Data Representation

 Interpretation is in the eye of the beholder  What does this represent?

01110111011010000111100100111111

slide-22
SLIDE 22

CS 230 - Spring 2020 1-22

Data Representation

01110111011010000111100100111111 01110111 01101000 01111001 00111111 7 7 6 8 7 9 3 F 77 68 79 3F w h y ?

 Or: 2,003,335,48710

slide-23
SLIDE 23

CS 230 - Spring 2020 1-23

Big Integers

 Word size currently 32 or 64 bits  Programming libraries offer big integer types  Complex data structures – more costly

 operations in software, rather than hardware

slide-24
SLIDE 24

CS 230 - Spring 2020 1-24

Data Interpretation

 Bits have no inherent meaning

 interpretation is in the eye of the beholder  must start from implicit agreement

 ASCII, UTF, Floating Point, Two’s Complement, etc.