"why perl utf-8" also (bonus!) "why OmniGraffle - - PowerPoint PPT Presentation

why perl utf 8 also bonus why omnigraffle is not a
SMART_READER_LITE
LIVE PREVIEW

"why perl utf-8" also (bonus!) "why OmniGraffle - - PowerPoint PPT Presentation

"why perl utf-8" also (bonus!) "why OmniGraffle is not a replacement for Powerpoint" perl programmers who everyone understand character sets me. and mark, sometimes. and nick. 10010110 byte character a


slide-1
SLIDE 1

"why perl ⤠utf-8" also (bonus!) "why OmniGraffle is not a replacement for Powerpoint"

slide-2
SLIDE 2 perl programmers who understand character sets everyone
  • me. and mark, sometimes.
and nick.
slide-3
SLIDE 3

10010110

byte

slide-4
SLIDE 4

character

slide-5
SLIDE 5

10010101 01101100 1101011 00110100

a sequence of bytes

slide-6
SLIDE 6

a sequence of characters

a sequence of characters

slide-7
SLIDE 7

10010101 01101100 1101011 001101

slide-8
SLIDE 8

11100010 10011000 10000011 + "that's utf-8"

slide-9
SLIDE 9 Perl String (utf-8 flag on) utf-8 byte sequence latin-1 byte sequence Perl String (utf-8 flag off) Encode::encode("latin-1", $a) utf8::upgrade($a) (in place) utf8::downgrade($a) (in place) Encode::encode("utf8", $a) OR Encode::_utf8_off($a) (in place) Encode::encode("utf8", $a) Encode::decode("utf8", $a) OR Encode::_utf8_on($a) (in place) Encode::decode("utf8", $a) Encode::encode("latin-1", $a) Encode::decode("latin-1", $a) Encode::decode("latin-1", $a) Encode::from_to("utf8", "latin-1", $a) (in place) Encode::from_to("latin-1", "utf-8", $a) (in place)

slide-10
SLIDE 10 latin-1 byte sequence bytes = code points = characters everything Just Works
slide-11
SLIDE 11 Perl String (utf-8 flag off) bytes = code points = characters everything Just Works
slide-12
SLIDE 12 Perl String (utf-8 flag off) latin-1 byte sequence
slide-13
SLIDE 13 utf-8 byte sequence This is a sequence of bytes
slide-14
SLIDE 14 Perl String (utf-8 flag on) This is a sequence of characters
slide-15
SLIDE 15 Perl String (utf-8 flag on) utf-8 byte sequence Encode::_utf8_on($scalar) Encode::_utf8_off($scalar)
slide-16
SLIDE 16 Perl String (utf-8 flag on) latin 1 byte sequence Encode::_utf8_on($scalar) Encode::_utf8_off($scalar)
slide-17
SLIDE 17 Perl String (utf-8 flag on) latin 1 byte sequence Encode::_utf8_on($scalar) Encode::_utf8_off($scalar) segfault
slide-18
SLIDE 18 Perl String (utf-8 flag on) Perl String (utf-8 flag off) latin-1 byte sequence utf-8 byte sequence
slide-19
SLIDE 19 Perl String (utf-8 flag on) utf-8 byte sequence latin-1 byte sequence Perl String (utf-8 flag off) Encode::encode("latin-1", $a) utf8::upgrade($a) (in place) utf8::downgrade($a) (in place) Encode::encode("utf8", $a) OR Encode::_utf8_off($a) (in place) Encode::encode("utf8", $a) Encode::decode("utf8", $a) OR Encode::_utf8_on($a) (in place) Encode::decode("utf8", $a) Encode::encode("latin-1", $a) Encode::decode("latin-1", $a) Encode::decode("latin-1", $a) Encode::from_to("utf8", "latin-1", $a) (in place) Encode::from_to("latin-1", "utf-8", $a) (in place)
slide-20
SLIDE 20 Perl String (utf-8 flag on) utf-8 byte sequence latin-1 byte sequence Perl String (utf-8 flag off) Encode::encode("utf8", $a) Encode::decode("utf8", $a) Encode::encode("latin-1", $a) Encode::decode("latin-1", $a) Encode::from_to("utf8", "latin-1", $a) (in place) Encode::from_to("latin-1", "utf-8", $a) (in place)
slide-21
SLIDE 21 utf-8 byte sequence latin-1 byte sequence Perl String Encode::encode("utf8", $a) Encode::decode("utf8", $a) Encode::encode("latin-1", $a) Encode::decode("latin-1", $a)
slide-22
SLIDE 22

$bytes = Encode::encode( 'encoding', $chars ) $chars = Encode::decode( 'encoding', $bytes )

slide-23
SLIDE 23

use Devel::Peek

slide-24
SLIDE 24

XS

not very nice
slide-25
SLIDE 25

SV = PV(0x8131020) at 0x811d234 REFCNT = 1 FLAGS = (POK,READONLY,pPOK) PV = 0x812a9c8 "\351"\0 CUR = 1 LEN = 2

the character the bytes

é

slide-26
SLIDE 26

SV = PV(0x811d470) at 0x8127c38 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x8122ee8 "\303\251"\0 [UTF8 "\x{e9}"] CUR = 2 LEN = 3

the character the bytes

é é

slide-27
SLIDE 27

é

DBD::mysql

slide-28
SLIDE 28

2 approaches

right fast Encode::encode Encode::decode Encode::_utf8_on

slide-29
SLIDE 29

the real correct approach

DBD::Pg

slide-30
SLIDE 30

XML

slide-31
SLIDE 31

XML::LibXML nice perl strings XML

slide-32
SLIDE 32

XML::LibXML nice perl strings XML garbage

slide-33
SLIDE 33

use java

there are very expensive courses you can go to

slide-34
SLIDE 34