unicode decode
play

unicode.decode() lea king frm yo ur eye s lik e liq uid p ain - PowerPoint PPT Presentation

unicode.decode() lea king frm yo ur eye s lik e liq uid p ain ukasz Taczuk ukasz Taczuk The guessing game: f.write( u'tralala: {}' .format( 'foo' .encode( u'utf8' ))) ukasz Taczuk Python 2 / Python 3


  1. unicode.decode() “lea ͠ ki̧n͘g fr̶ǫm ̡yo ͟ ur eye ͢ s̸ ̛l̕ik e liq uid p ain” Łukasz Taczuk

  2. Łukasz Taczuk

  3. The guessing game: f.write( u'tralala: {}' .format( 'foo' .encode( u'utf8' ))) Łukasz Taczuk

  4. Python 2 / Python 3 Łukasz Taczuk

  5. Python 2 Łukasz Taczuk

  6. Python 2 str = 'tralala' unicode = u'tralala' 'tralala' is b'tralala' Łukasz Taczuk

  7. What IS unicode in Python? “I always thought that text in utf-8 was exactly that: Unicode data!” - Janusz programowania Łukasz Taczuk

  8. Python 2 str = . unicode = . Łukasz Taczuk

  9. unicode ⇔ str conversion Abstraction Physical unicode. encode (<encoding>) str Physical Abstraction str. decode (<encoding>) unicode Łukasz Taczuk

  10. unicode ⇔ str conversion encode decode Łukasz Taczuk

  11. unicode ⇔ str conversion decode encode Łukasz Taczuk

  12. unicode ⇔ str conversion unicode.decode(<encoding>) str.encode(<encoding>) Łukasz Taczuk

  13. Automatic type conversion (1) f.write() - converts to str yourlibrary.method() - converts to whatever it feels like :) Łukasz Taczuk

  14. Automatic type conversion (2) FOO.format(BAR) - Automatically converts to type(FOO) FOO % BAR - does the same . Template('$bar').substitute(bar=BAR) - does the same as well Łukasz Taczuk

  15. Automatic type conversion (3) FOO.encode(<encoding>) - Converts to unicode FIRST FOO.decode(<encoding>) - Converts to str FIRST Łukasz Taczuk

  16. Quiz time! Łukasz Taczuk

  17. 1.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( 'tralala: {}' .format( 'asdł' )) Łukasz Taczuk

  18. 1.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( 'tralala: {}' .format( 'asdł' )) OK! Łukasz Taczuk

  19. 2.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( u'tralala: {}' .format( u'asdł' )) Łukasz Taczuk

  20. 2.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( u'tralala: {}' .format( u'asdł' )) Traceback (most recent call last): File "2.py", line 4, in <module> f.write(u'tralala: {}'.format(u'asdł')) UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in position 12: ordinal not in range(128) Łukasz Taczuk

  21. 3.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( u'tralala: {}' .format( 'asdł' )) Łukasz Taczuk

  22. 3.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( u'tralala: {}' .format( 'asdł' )) Traceback (most recent call last): File "3.py", line 4, in <module> f.write(u'tralala: {}'.format('asdł')) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 3: ordinal not in range(128) Łukasz Taczuk

  23. 4.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( 'tralala: {}' .format( u'asdł' )) Łukasz Taczuk

  24. 4.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( 'tralala: {}' .format( u'asdł' )) Traceback (most recent call last): File "4.py", line 4, in <module> f.write('tralala: {}'.format(u'asdł')) UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in position 3: ordinal not in range(128) Łukasz Taczuk

  25. 5.py = 1.py "wb" 6.py = 2.py "wb" 7.py = 3.py "wb" 8.py = 4.py "wb" Łukasz Taczuk

  26. 5.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'wb' ) as f: f.write( 'tralala: {}' .format( 'asdł' )) Łukasz Taczuk

  27. 5.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'wb' ) as f: f.write( 'tralala: {}' .format( 'asdł' )) OK! Łukasz Taczuk

  28. 6.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'wb' ) as f: f.write( u'tralala: {}' .format( u'asdł' )) Łukasz Taczuk

  29. 6.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'wb' ) as f: f.write( u'tralala: {}' .format( u'asdł' )) Traceback (most recent call last): File "6.py", line 4, in <module> f.write(u'tralala: {}'.format(u'asdł')) UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in position 12: ordinal not in range(128) Łukasz Taczuk

  30. 7.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'wb' ) as f: f.write( u'tralala: {}' .format( 'asdł' )) Łukasz Taczuk

  31. 7.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'wb' ) as f: f.write( u'tralala: {}' .format( 'asdł' )) Traceback (most recent call last): File "7.py", line 4, in <module> f.write(u'tralala: {}'.format('asdł')) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 3: ordinal not in range(128) Łukasz Taczuk

  32. 8.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'wb' ) as f: f.write( 'tralala: {}' .format( u'asdł' )) Łukasz Taczuk

  33. 8.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'wb' ) as f: f.write( 'tralala: {}' .format( u'asdł' )) Traceback (most recent call last): File "8.py", line 4, in <module> f.write('tralala: {}'.format(u'asdł')) UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in position 3: ordinal not in range(128) Łukasz Taczuk

  34. encode / decode Łukasz Taczuk

  35. encode.py # -*- coding: utf-8 -*- 'asdasdł' .encode( 'utf8' ) Łukasz Taczuk

  36. encode.py # -*- coding: utf-8 -*- 'asdasdł' .encode( 'utf8' ) Traceback (most recent call last): File "encode.py", line 3, in <module> 'asdasdł'.encode('utf8') UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 6: ordinal not in range(128) Łukasz Taczuk

  37. decode.py # -*- coding: utf-8 -*- u'asdasdł' .decode( 'utf8' ) Łukasz Taczuk

  38. decode.py # -*- coding: utf-8 -*- u'asdasdł' .decode( 'utf8' ) Traceback (most recent call last): File "decode.py", line 3, in <module> u'asdasdł'.decode('utf8') File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in position 6: ordinal not in range(128) Łukasz Taczuk

  39. An alternative way of writing files to disk # -*- coding: utf-8 -*- import codecs with codecs.open( 'export.csv' , 'w' , encoding= "utf-8" ) as f: f.write( u'żółw' ) Łukasz Taczuk

  40. Python 3 Łukasz Taczuk

  41. Python 3 bytes = b'tralala' str = 'tralala' 'tralala' is u'tralala' Łukasz Taczuk

  42. Python 3 bytes = . str = . Łukasz Taczuk

  43. Let’s do it all one more time! :) Łukasz Taczuk

  44. 1.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( b'tralala: {}' .format( b'asdł' )) Łukasz Taczuk

  45. 1.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( b'tralala: {}' .format( b'asdł' )) File "1.py", line 4 f.write(b'tralala: {}'.format(b'asdł')) ^ SyntaxError: bytes can only contain ASCII literal characters. Łukasz Taczuk

  46. 2.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( 'tralala: {}' .format( 'asdł' )) Łukasz Taczuk

  47. 2.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( 'tralala: {}' .format( 'asdł' )) OK! Łukasz Taczuk

  48. 3.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( 'tralala: {}' .format( b'asdł' )) Łukasz Taczuk

  49. 3.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( 'tralala: {}' .format( b'asdł' )) File "3.py", line 4 f.write('tralala: {}'.format(b'asdł')) ^ SyntaxError: bytes can only contain ASCII literal characters. Łukasz Taczuk

  50. 4.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( b'tralala: {}' .format( 'asdł' )) Łukasz Taczuk

  51. 4.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'w' ) as f: f.write( b'tralala: {}' .format( 'asdł' )) Traceback (most recent call last): File "4.py", line 4, in <module> f.write(b'tralala: {}'.format('asdł')) AttributeError: 'bytes' object has no attribute 'format' Łukasz Taczuk

  52. 5.py = 1.py "wb" 6.py = 2.py "wb" 7.py = 3.py "wb" 8.py = 4.py "wb" Łukasz Taczuk

  53. 5.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'wb' ) as f: f.write( b'tralala: {}' .format( b'asdł' )) Łukasz Taczuk

  54. 5.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'wb' ) as f: f.write( b'tralala: {}' .format( b'asdł' )) File "5.py", line 4 f.write(b'tralala: {}'.format(b'asdł')) ^ SyntaxError: bytes can only contain ASCII literal characters. Łukasz Taczuk

  55. 6.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'wb' ) as f: f.write( 'tralala: {}' .format( 'asdł' )) Łukasz Taczuk

  56. 6.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'wb' ) as f: f.write( 'tralala: {}' .format( 'asdł' )) Traceback (most recent call last): File "6.py", line 4, in <module> f.write('tralala: {}'.format('asdł')) TypeError: a bytes-like object is required, not 'str' Łukasz Taczuk

  57. 7.py # -*- coding: utf-8 -*- with open( 'export.csv' , 'wb' ) as f: f.write( 'tralala: {}' .format( b'asdł' )) Łukasz Taczuk

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend