unicode.decode() lea king frm yo ur eye s lik e liq uid p ain - - PowerPoint PPT Presentation

unicode decode
SMART_READER_LITE
LIVE PREVIEW

unicode.decode() lea king frm yo ur eye s lik e liq uid p ain - - PowerPoint PPT Presentation

unicode.decode() lea king frm yo ur eye s lik e liq uid p ain ukasz Taczuk ukasz Taczuk The guessing game: f.write( u'tralala: {}' .format( 'foo' .encode( u'utf8' ))) ukasz Taczuk Python 2 / Python 3


slide-1
SLIDE 1

Łukasz Taczuk

unicode.decode()

“lea ͠ ki̧n͘g fr̶ǫm ̡yo ͟ ur eye ͢ s̸ ̛l̕ik e liq uid pain”

slide-2
SLIDE 2

Łukasz Taczuk

slide-3
SLIDE 3

Łukasz Taczuk

f.write(u'tralala: {}'.format('foo'.encode(u'utf8')))

The guessing game:

slide-4
SLIDE 4

Łukasz Taczuk

Python 2 / Python 3

slide-5
SLIDE 5

Łukasz Taczuk

Python 2

slide-6
SLIDE 6

Łukasz Taczuk

str = 'tralala' unicode = u'tralala' Python 2 'tralala' is b'tralala'

slide-7
SLIDE 7

Łukasz Taczuk

“I always thought that text in utf-8 was exactly that: Unicode data!”

  • Janusz programowania

What IS unicode in Python?

slide-8
SLIDE 8

Łukasz Taczuk

str = . unicode = . Python 2

slide-9
SLIDE 9

Łukasz Taczuk

unicode ⇔ str conversion

Abstraction Physical unicode.encode(<encoding>) str Physical Abstraction str.decode(<encoding>) unicode

slide-10
SLIDE 10

Łukasz Taczuk

unicode ⇔ str conversion

encode decode

slide-11
SLIDE 11

Łukasz Taczuk

unicode ⇔ str conversion

decode encode

slide-12
SLIDE 12

Łukasz Taczuk

unicode ⇔ str conversion

unicode.decode(<encoding>) str.encode(<encoding>)

slide-13
SLIDE 13

Łukasz Taczuk

Automatic type conversion (1)

f.write() - converts to str yourlibrary.method() - converts to whatever it feels like :)

slide-14
SLIDE 14

Łukasz Taczuk

Automatic type conversion (2)

FOO.format(BAR) - Automatically converts to type(FOO) FOO % BAR - does the same . Template('$bar').substitute(bar=BAR) - does the same as well

slide-15
SLIDE 15

Łukasz Taczuk

Automatic type conversion (3)

FOO.encode(<encoding>) - Converts to unicode FIRST FOO.decode(<encoding>) - Converts to str FIRST

slide-16
SLIDE 16

Łukasz Taczuk

Quiz time!

slide-17
SLIDE 17

Łukasz Taczuk

1.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write('tralala: {}'.format('asdł'))

slide-18
SLIDE 18

Łukasz Taczuk

1.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write('tralala: {}'.format('asdł'))

OK!

slide-19
SLIDE 19

Łukasz Taczuk

2.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write(u'tralala: {}'.format(u'asdł'))

slide-20
SLIDE 20

Łukasz Taczuk

2.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write(u'tralala: {}'.format(u'asdł'))

Traceback (most recent call last): File "2.py", line 4, in <module> f.write(u'tralala: {}'.format(u'asdł')) UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in position 12:

  • rdinal not in range(128)
slide-21
SLIDE 21

Łukasz Taczuk

3.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write(u'tralala: {}'.format('asdł'))

slide-22
SLIDE 22

Łukasz Taczuk

3.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write(u'tralala: {}'.format('asdł'))

Traceback (most recent call last): File "3.py", line 4, in <module> f.write(u'tralala: {}'.format('asdł')) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 3: ordinal not in range(128)

slide-23
SLIDE 23

Łukasz Taczuk

4.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write('tralala: {}'.format(u'asdł'))

slide-24
SLIDE 24

Łukasz Taczuk

4.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write('tralala: {}'.format(u'asdł'))

Traceback (most recent call last): File "4.py", line 4, in <module> f.write('tralala: {}'.format(u'asdł')) UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in position 3:

  • rdinal not in range(128)
slide-25
SLIDE 25

Łukasz Taczuk

5.py = 1.py "wb" 6.py = 2.py "wb" 7.py = 3.py "wb" 8.py = 4.py "wb"

slide-26
SLIDE 26

Łukasz Taczuk

5.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write('tralala: {}'.format('asdł'))

slide-27
SLIDE 27

Łukasz Taczuk

5.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write('tralala: {}'.format('asdł'))

OK!

slide-28
SLIDE 28

Łukasz Taczuk

6.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write(u'tralala: {}'.format(u'asdł'))

slide-29
SLIDE 29

Łukasz Taczuk

6.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write(u'tralala: {}'.format(u'asdł'))

Traceback (most recent call last): File "6.py", line 4, in <module> f.write(u'tralala: {}'.format(u'asdł')) UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in position 12:

  • rdinal not in range(128)
slide-30
SLIDE 30

Łukasz Taczuk

7.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write(u'tralala: {}'.format('asdł'))

slide-31
SLIDE 31

Łukasz Taczuk

7.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write(u'tralala: {}'.format('asdł'))

Traceback (most recent call last): File "7.py", line 4, in <module> f.write(u'tralala: {}'.format('asdł')) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 3: ordinal not in range(128)

slide-32
SLIDE 32

Łukasz Taczuk

8.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write('tralala: {}'.format(u'asdł'))

slide-33
SLIDE 33

Łukasz Taczuk

8.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write('tralala: {}'.format(u'asdł'))

Traceback (most recent call last): File "8.py", line 4, in <module> f.write('tralala: {}'.format(u'asdł')) UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in position 3:

  • rdinal not in range(128)
slide-34
SLIDE 34

Łukasz Taczuk

encode / decode

slide-35
SLIDE 35

Łukasz Taczuk

encode.py

# -*- coding: utf-8 -*- 'asdasdł'.encode('utf8')

slide-36
SLIDE 36

Łukasz Taczuk

encode.py

# -*- coding: utf-8 -*- 'asdasdł'.encode('utf8')

Traceback (most recent call last): File "encode.py", line 3, in <module> 'asdasdł'.encode('utf8') UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 6: ordinal not in range(128)

slide-37
SLIDE 37

Łukasz Taczuk

decode.py

# -*- coding: utf-8 -*- u'asdasdł'.decode('utf8')

slide-38
SLIDE 38

Łukasz Taczuk

decode.py

# -*- coding: utf-8 -*- u'asdasdł'.decode('utf8')

Traceback (most recent call last): File "decode.py", line 3, in <module> u'asdasdł'.decode('utf8') File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in position 6: ordinal not in range(128)

slide-39
SLIDE 39

Łukasz Taczuk

An alternative way of writing files to disk

# -*- coding: utf-8 -*- import codecs with codecs.open('export.csv', 'w', encoding="utf-8") as f: f.write(u'żółw')

slide-40
SLIDE 40

Łukasz Taczuk

Python 3

slide-41
SLIDE 41

Łukasz Taczuk

bytes = b'tralala' str = 'tralala' Python 3 'tralala' is u'tralala'

slide-42
SLIDE 42

Łukasz Taczuk

bytes = . str = . Python 3

slide-43
SLIDE 43

Łukasz Taczuk

Let’s do it all one more time! :)

slide-44
SLIDE 44

Łukasz Taczuk

1.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write(b'tralala: {}'.format(b'asdł'))

slide-45
SLIDE 45

Łukasz Taczuk

1.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write(b'tralala: {}'.format(b'asdł')) File "1.py", line 4 f.write(b'tralala: {}'.format(b'asdł')) ^ SyntaxError: bytes can only contain ASCII literal characters.

slide-46
SLIDE 46

Łukasz Taczuk

2.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write('tralala: {}'.format('asdł'))

slide-47
SLIDE 47

Łukasz Taczuk

2.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write('tralala: {}'.format('asdł'))

OK!

slide-48
SLIDE 48

Łukasz Taczuk

3.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write('tralala: {}'.format(b'asdł'))

slide-49
SLIDE 49

Łukasz Taczuk

3.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write('tralala: {}'.format(b'asdł')) File "3.py", line 4 f.write('tralala: {}'.format(b'asdł')) ^ SyntaxError: bytes can only contain ASCII literal characters.

slide-50
SLIDE 50

Łukasz Taczuk

4.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write(b'tralala: {}'.format('asdł'))

slide-51
SLIDE 51

Łukasz Taczuk

4.py

# -*- coding: utf-8 -*- with open('export.csv', 'w') as f: f.write(b'tralala: {}'.format('asdł')) Traceback (most recent call last): File "4.py", line 4, in <module> f.write(b'tralala: {}'.format('asdł')) AttributeError: 'bytes' object has no attribute 'format'

slide-52
SLIDE 52

Łukasz Taczuk

5.py = 1.py "wb" 6.py = 2.py "wb" 7.py = 3.py "wb" 8.py = 4.py "wb"

slide-53
SLIDE 53

Łukasz Taczuk

5.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write(b'tralala: {}'.format(b'asdł'))

slide-54
SLIDE 54

Łukasz Taczuk

5.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write(b'tralala: {}'.format(b'asdł')) File "5.py", line 4 f.write(b'tralala: {}'.format(b'asdł')) ^ SyntaxError: bytes can only contain ASCII literal characters.

slide-55
SLIDE 55

Łukasz Taczuk

6.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write('tralala: {}'.format('asdł'))

slide-56
SLIDE 56

Łukasz Taczuk

6.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write('tralala: {}'.format('asdł')) Traceback (most recent call last): File "6.py", line 4, in <module> f.write('tralala: {}'.format('asdł')) TypeError: a bytes-like object is required, not 'str'

slide-57
SLIDE 57

Łukasz Taczuk

7.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write('tralala: {}'.format(b'asdł'))

slide-58
SLIDE 58

Łukasz Taczuk

7.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write('tralala: {}'.format(b'asdł')) File "7.py", line 4 f.write('tralala: {}'.format(b'asdł')) ^ SyntaxError: bytes can only contain ASCII literal characters.

slide-59
SLIDE 59

Łukasz Taczuk

8.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write(b'tralala: {}'.format('asdł'))

slide-60
SLIDE 60

Łukasz Taczuk

8.py

# -*- coding: utf-8 -*- with open('export.csv', 'wb') as f: f.write(b'tralala: {}'.format('asdł')) Traceback (most recent call last): File "8.py", line 4, in <module> f.write(b'tralala: {}'.format('asdł')) AttributeError: 'bytes' object has no attribute 'format'

slide-61
SLIDE 61

Łukasz Taczuk

encode / decode

slide-62
SLIDE 62

Łukasz Taczuk

encode3.py

# -*- coding: utf-8 -*- b'asdasdł'.encode('utf8')

slide-63
SLIDE 63

Łukasz Taczuk

encode3.py

# -*- coding: utf-8 -*- b'asdasdł'.encode('utf8') File "encode3.py", line 3 b'asdasdł'.encode('utf8') ^ SyntaxError: bytes can only contain ASCII literal characters.

slide-64
SLIDE 64

Łukasz Taczuk

encode3-bis.py

# -*- coding: utf-8 -*- b'asdasd'.encode('utf8')

slide-65
SLIDE 65

Łukasz Taczuk

encode3-bis.py

# -*- coding: utf-8 -*- b'asdasd'.encode('utf8') Traceback (most recent call last): File "encode3-bis.py", line 3, in <module> b'asdasd'.encode('utf8') AttributeError: 'bytes' object has no attribute 'encode'

slide-66
SLIDE 66

Łukasz Taczuk

decode3.py

# -*- coding: utf-8 -*- 'asdasd'.decode('utf8')

slide-67
SLIDE 67

Łukasz Taczuk

decode3.py

# -*- coding: utf-8 -*- 'asdasdł'.decode('utf8') Traceback (most recent call last): File "decode3.py", line 3, in <module> 'asdasdł'.decode('utf8') AttributeError: 'str' object has no attribute 'decode'

slide-68
SLIDE 68

Łukasz Taczuk

Python 3 advantages

Does not convert strings to bytes implicitly Does not convert bytes to strings implicitly Simply does not contain “dangerous” methods Catches potential errors at parsing time

slide-69
SLIDE 69

Łukasz Taczuk

Miscellaneous

https://github.com/overfl0/Bulletproof-Arma-Launcher/blob/ne xt/src/utils/unicode_helpers.py locale.getpreferredencoding() sys.getfilesystemencoding() print(...) -> sys.stdout.encoding

slide-70
SLIDE 70

Łukasz Taczuk

Takeaway

slide-71
SLIDE 71

Łukasz Taczuk

decode encode

(bytes) (str) (bytes) Outside world Files Network Outside world Files Network

Your Python 3 Application

slide-72
SLIDE 72

Łukasz Taczuk

f.write(u'tralala: {}'.format('foo'.encode(u'utf8')))

The guessing game again:

slide-73
SLIDE 73

Łukasz Taczuk

See also:

https://medium.com/@andreacolangelo/strings-unicode-and

  • bytes-in-python-3-everything-you-always-wanted-to-know-2

7dc02ff2686 https://nedbatchelder.com/text/unipain.html

slide-74
SLIDE 74

Łukasz Taczuk

unicode('python2') is b'itch'

(the irony is that the above statement evaluates to False)

Thank you! Any questions?