Overview/Questions How is text represented within computer? How - - PDF document

overview questions
SMART_READER_LITE
LIVE PREVIEW

Overview/Questions How is text represented within computer? How - - PDF document

CS108 Lecture 08: Computing with Text String module operations Character encoding/decoding Aaron Stevens 4 February 2009 1 Overview/Questions How is text represented within computer? How can we manipulate text in our programs? 2 1


slide-1
SLIDE 1

1

1

Aaron Stevens

4 February 2009

CS108 Lecture 08: Computing with Text

String module operations Character encoding/decoding

2

Overview/Questions

– How is text represented within computer? – How can we manipulate text in our programs?

slide-2
SLIDE 2

2

3

Review of String Operations

Meaning Operator/Operation

Iteration for <var> in <string> Length len(<string>) Slicing <string>[<begin>:<end>] Indexing <string>[<expression>] Repetition * Concatenation +

4

Strings, Lists, and Sequences

The operations in the previous table are not really just string operations. They apply to any <sequence>, which includes any list. Show some list examples using a list of numbers… (as time permits)

slide-3
SLIDE 3

3

5

String Module Operations

Python provides a built-in module of useful string manipulation functions. Examples:

>>> import string >>> text = "the cat in the hat" >>> string.capitalize(text) >>> string.capwords(text) >>> string.upper(text) >>> string.center(text,40)

Note: this string library is mostly deprecated... We’ll discuss an alternative next week.

6

String Module Operations

More examples:

>>> text = "I love watching birds fly" >>> string.replace(text, "birds","fish") >>> text = "to be or not to be" >>> string.count(text,"o") >>> string.find(text,"be") >>> string.split(text)

slide-4
SLIDE 4

4

7

String Module Operations

Refer to table 4.2 on page 96 of Zelle for a summary of the Python string module. http://docs.python.org/lib/node42.html shows the complete list.

Returns a copy of <str> in uppercase. upper(<str>) Split <str> into a list of words, using <delim> as delimeter. split(<str>, [<delim]) Remove leading/trailing white space. strip(<str>) Return a copy of <str>replacing all occurrences of <old> with <new>. replace(<str>,<old>,<new>) Meaning Function Returns a copy of <str> in lowercase. lower(<str) Concatenate a list of strings into a single string. join(<list>) Find index of the first occurrence of <sub> in <str>. find(<str>,<sub>) Count occurrences of <sub> in <str>. count(<str>,<sub>) Center string in <width> spaces. center(<str>, <width>) Capitalize entire text or first letter of each word. capitalize(<str>), capwords(<str>)

8

String Module Example

Replacing all occurrences of a word: Also: show split word into list of words.

slide-5
SLIDE 5

5

9

Character Encoding

Encoding Computers store text data by representing each character/symbol as a number, and storing that number in binary. American Standard for Computer Information Interchange ASCII - the most common encoding scheme. Each symbol assigned a unique number.

10

The ASCII Character Set

ASCII stands for American Standard Code for Information Interchange ASCII originally used seven bits to represent each character, allowing for 128 unique characters Later extended ASCII evolved so that all eight bits were used.

slide-6
SLIDE 6

6

11

The ASCII Character Set (7 bits)

12

The Extended ASCII Character Set

slide-7
SLIDE 7

7

13

Character Encoding

Python can convert characters to ASCII using the built-in ord(<character>). Example:

>>> ord("A") >>> >>> text = "The Cat in the Hat" >>> for ch in text: ... print ord(ch), ... print # blank line

14

Character Encoding

Example: Collect some text from the user Print out sequence of ASCII character codes

text = raw_input("Enter your text: ") for ch in text: print ord(ch), ", ", print # blank line

slide-8
SLIDE 8

8

15

Character Decoding

Python also has a built-in chr(<num>) which converts a number to its corresponding ASCII character.

Example:

>>> num = 81 >>> print chr(num)

16

Character Decoding

What does the following message say? 71, 105, 97, 110, 116, 115, 32, 98, 121, 32, 84, 104, 114, 101, 101, 32, 80, 111, 105, 110, 116, 115 How would you figure it out?

slide-9
SLIDE 9

9

17

Character Decoding

Example: decoding an ASCII message

– Treat the numbers as a list – For each number on the list:

  • Treat numeric symbols as number
  • Convert number to ASCII character

>>> numbers = input("Enter a sequence of ASCII numbers: ") >>> for n in numbers: ... print chr(n),

18

Take-Away Points

– String module functions – Character set, ASCII encoding – Character-encoding/decoding

slide-10
SLIDE 10

10

19

Student To Dos

– HW 03: definite loop, due Tuesday 2/3 – Readings this week:

  • Zelle 4.1-4.3 (today, Wednesday)
  • Zelle 4.4-4.5 (Friday)