Ruby-us Hagrid Writing Harry Potter with Ruby alexpeattie.com/hp - - PowerPoint PPT Presentation

ruby us hagrid
SMART_READER_LITE
LIVE PREVIEW

Ruby-us Hagrid Writing Harry Potter with Ruby alexpeattie.com/hp - - PowerPoint PPT Presentation

Ruby-us Hagrid Writing Harry Potter with Ruby alexpeattie.com/hp @alexpeattie Writing Harry Potter with Ruby Why should we do it? What can we achieve? How can we do it? Why should we do it? Category A Category B The Potheads The


slide-1
SLIDE 1

Ruby-us Hagrid

Writing Harry Potter with Ruby @alexpeattie alexpeattie.com/hp

slide-2
SLIDE 2

Why should we do it? What can we achieve? How can we do it?

Writing Harry Potter with Ruby

slide-3
SLIDE 3

Why should we do it?

slide-4
SLIDE 4

Category A

The “Potheads”

Category B

The “Notters”

“Ouch, my heart” “Is that Y

  • da?”
slide-5
SLIDE 5

What can we achieve?

slide-6
SLIDE 6

(Spoiler!)

slide-7
SLIDE 7

Neville, Seamus and Dean were muttering but did not speak when Harry had told Fudge mere weeks ago that Malfoy was crying, actually crying tears, streaming down the sides of their heads. “They revealed a spell to make your bludger” said Harry, anger rising once more.

slide-8
SLIDE 8

How can we do it?

slide-9
SLIDE 9

“They revealed a spell to make your bludger” said Harry, anger rising once more. Key idea 1: Tell the story word by word Key idea 2: Let’s take inspiration from our phones

slide-10
SLIDE 10

https://alexpeattie.com/assets/images/talks/hp/predictive.mp4

slide-11
SLIDE 11

After “birthday”, I’ve used the word:

  • “party” 30 times
  • “cake” 20 times
  • “wishes” 10 times
slide-12
SLIDE 12

After “golden”, J.K. used the word:

  • “egg” 13 times
  • “snitch” 11 times
  • “plates” 10 times

The world “golden” appears in the Harry Potter books 226 times.

slide-13
SLIDE 13

After “golden”, J.K. used the word:

  • “egg” 13 times
  • “snitch” 11 times
  • “plates” 10 times

The world “golden” appears in the Harry Potter books 226 times.

Head Continuations

slide-14
SLIDE 14

Step 1 Learn Step 2 Generate

Key idea 3

slide-15
SLIDE 15

golden egg snitch plates light ⋮ liquid 13 11 10 9 1 goldfish

  • ut

any bowls above 1 1 1 1 golf balls 2 ⋮

21,814 words

slide-16
SLIDE 16

{ :golden => { :egg => 13, :snitch => 11, :plates => 10, :light => 9, :liquid => 1 }, :goldfish => { :out => 1, :any => 1, :of => 1, :bowls => 1 }, :golf => { :balls => 2 } }

slide-17
SLIDE 17

alexpeattie.com/hp

slide-18
SLIDE 18

def tokenize(text) text.downcase.split(/[^a-z]+/).reject(&:empty?).map(&:to_sym) end "Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal" [:mr, :and, :mrs, :dursley, :of, :number, :four, :privet, :drive, :were, :proud, :to, :say, :that, :they, :were, :perfectly, :normal]

slide-19
SLIDE 19

text = tokenize "The cat sat on the mat. The cat was happy." stats = {} text.each_cons(2) do |head, continuation| stats[head] ||= Hash.new(0) stats[head][continuation] += 1 end

slide-20
SLIDE 20

text = tokenize "The cat sat on the mat. The cat was happy." stats = {} text.each_cons(2) do |head, continuation| stats[head] ||= Hash.new(0) stats[head][continuation] += 1 end [:the, :cat]

head continuation

{ :the => { :cat => 1 } }

slide-21
SLIDE 21

text = tokenize "The cat sat on the mat. The cat was happy." stats = {} text.each_cons(2) do |head, continuation| stats[head] ||= Hash.new(0) stats[head][continuation] += 1 end [:cat, :sat]

head continuation

{ :the => { :cat => 1 }, :cat => { :sat => 1 } }

slide-22
SLIDE 22

text = tokenize "The cat sat on the mat. The cat was happy." stats = {} text.each_cons(2) do |head, continuation| stats[head] ||= Hash.new(0) stats[head][continuation] += 1 end

{ :the => { :cat => 2, :mat => 1 }, :cat => { :sat => 1, :was => 1 }, :sat => { :on => 1 }, :on => { :the => 1 }, :mat => { :the => 1 }, :was => { :happy => 1 } }

slide-23
SLIDE 23

Step 1 Learn ✅ Step 2 Generate

slide-24
SLIDE 24

Greedy algorithm

slide-25
SLIDE 25

Pick most frequent continuation

slide-26
SLIDE 26

Pick most frequent continuation

slide-27
SLIDE 27

def pick_next_word_greedily(head) continuations = stats[head] chosen_word, count = continuations.max_by { |word, count| count } return chosen_word end

slide-28
SLIDE 28

story = [stats.keys.sample] # start with a random word from corpus 1.upto(50) do # 50 word story story << pick_next_word_greedily(story.last) end puts story.join(" ")

slide-29
SLIDE 29

Drumroll….

slide-30
SLIDE 30

“Oh no” said Harry. A few seconds later they were all the door and the door and the door and the door and the door.

slide-31
SLIDE 31

Take two….

slide-32
SLIDE 32

Surreptitiously, several of the door and the door and the door and the door and the door and the door and the door.

slide-33
SLIDE 33
slide-34
SLIDE 34

several

  • f

the door and

slide-35
SLIDE 35

conference enchantingly nasty little more than ever since he was a few seconds later they were all the door and… conference

slide-36
SLIDE 36

Greedy algorithm

slide-37
SLIDE 37

Let’s get random Uniform random algorithm

slide-38
SLIDE 38

Pick randomly w/ equal probability

slide-39
SLIDE 39

Pick randomly w/ equal probability

⅓ ⅓ ⅓

slide-40
SLIDE 40

Pick randomly w/ equal probability

egg snitch plates light ⋮ liquid 1/117 1/117 1/117 1/117 1/117

112 more

slide-41
SLIDE 41

def pick_random_next_word(head) continuations = stats[head] return continuations.keys.sample end

slide-42
SLIDE 42

Debris from boys or accompany him bodily from Ron, yell the waters. Harry laughing together soon father would then bleated the smelly cloud.

slide-43
SLIDE 43

What’s the problem?

slide-44
SLIDE 44

house elf 102 times

~1/200 chance

prices 1 time

~1/200 chance

slide-45
SLIDE 45

Let’s get (a bit less) random W eighted random algorithm

slide-46
SLIDE 46

house elf 102 times prices 1 time

~1/200 chance ~1/200 chance

734 times

slide-47
SLIDE 47

house elf 102 times prices 1 time

~1/7 chance ~1/700 chance

734 times

slide-48
SLIDE 48

Pick randomly w/ weighted probabilities

½ ⅓ ⅙

slide-49
SLIDE 49

def pick_next_word_weighted_randomly(head) continuations = stats[head] continuations.flat_map { |word, count| [word] * count }.sample end

slide-50
SLIDE 50

Springing forward as though they had a bite of the hippogriff, he staggered blindly retorting Harry some pumpkin tart.

slide-51
SLIDE 51

One last big idea…

slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54

Key idea 4: Improve output by looking at more than just 1 previous word

slide-55
SLIDE 55

{ :golden => { :egg => 12, :snitch => 11, :plates => 10, :light => 9, :liquid => 1 }, :goldfish => { :out => 1, :any => 1, :of => 1, :bowls => 1 }, :golf => { :balls => 2 } }

Two words

bi·gram

two word

slide-56
SLIDE 56

{ [:golden, :egg] => { :harry => 1, :very => 1, :and => 2, :which => 1, :upstairs => 1, :does => 1, :he => 2, :said => 1, :still => 1, :fell => 1 }, [:golden, :snitch] => { :and => 1, :had => 1, :said => 1, :it => 1, :a => 1, :with => 1, :was => 1, :where => 1, :worked => 1 } }

321,727 entries tri·gram

three word

Three words

slide-57
SLIDE 57

stats = {}
 n = 3 corpus.each_cons(n) do |*head, continuation| stats[head] ||= Hash.new(0) stats[head][continuation] += 1 end

Added splat

slide-58
SLIDE 58

[[:the, :cat], :sat]

head continuation

{ [:the, :cat] => { :sat => 1 } } stats = {}
 n = 3 corpus.each_cons(n) do |*head, continuation| stats[head] ||= Hash.new(0) stats[head][continuation] += 1 end

slide-59
SLIDE 59

Normally when Dudley found his voice barely louder than before. “Dementors” said Dumbledore steadily, he however found all this mess is utterly

  • worthless. Harry looked at him, put Slughorn into

his bag more securely on to bigger and bigger until their blackness swallowed Harry whole and started emptying his drawers. — trigram model

slide-60
SLIDE 60

Neville, Seamus and Dean were muttering but did not speak when Harry had told Fudge mere weeks ago that Malfoy was crying, actually crying tears, streaming down the sides of their heads. “They revealed a spell to make your bludger” said Harry, anger rising once more. — 4-gram model

slide-61
SLIDE 61

def tokenize(sentence) sentence.downcase.split(/[^a-z]+/).reject(&:empty?).map(&:to_sym) end def pick_next_word_weighted_randomly(head, stats) continuations = stats[head] continuations.flat_map { |word, count | [word] * count }.sample end text = tokenize(IO.read('hp.txt')) stats = {} n = 3 text.each_cons(n) do |*head, continuation| stats[head] ||= Hash.new(0) stats[head][continuation] += 1 end story = stats.keys.sample 1.upto(50) do story << pick_next_word_weighted_randomly(story.last(n - 1), stats) end puts story.join(" ")

20 lines

slide-62
SLIDE 62

Key idea 1: Tell the story word by word Key idea 2: Let’s take inspiration from our phones Key idea 3: Learn (stats about words and continuations), and generate (with weighted random algorithm) Key idea 4: Improve output by looking at more than just 1 previous word

alexpeattie.com/hp