Text-Denoising mit Go(lang) Bedcon 2015 Dennis Kluge Charit - - PowerPoint PPT Presentation

text denoising mit go lang
SMART_READER_LITE
LIVE PREVIEW

Text-Denoising mit Go(lang) Bedcon 2015 Dennis Kluge Charit - - PowerPoint PPT Presentation

Text-Denoising mit Go(lang) Bedcon 2015 Dennis Kluge Charit Berlin dataflex-science.de Berlin Expert Days 2015 | Dennis Kluge | Slide 3 Berlin Expert Days 2015 | Dennis Kluge | Folie 4 NLP Natural Language Processing Berlin


slide-1
SLIDE 1

Text-Denoising
 mit Go(lang)

Bedcon 2015 – Dennis Kluge – Charité Berlin

slide-2
SLIDE 2

dataflex-science.de

slide-3
SLIDE 3

Berlin Expert Days 2015 | Dennis Kluge | Slide 3

slide-4
SLIDE 4

Berlin Expert Days 2015 | Dennis Kluge | Folie 4

slide-5
SLIDE 5

NLP

Natural Language Processing

Berlin Expert Days 2015 | Dennis Kluge | Slide 5

slide-6
SLIDE 6

?

slide-7
SLIDE 7

Die @bedcon ist die großartigste #Konferenz des Jahres. 😎 
 http://bedcon.org

Berlin Expert Days 2015 | Dennis Kluge | Slide 7

slide-8
SLIDE 8

Die bedcon ist die großartigste Konferenz des Jahres

Berlin Expert Days 2015 | Dennis Kluge | Slide 8

slide-9
SLIDE 9

die bedcon ist die großartigste konferenz des jahres

Berlin Expert Days 2015 | Dennis Kluge | Slide 9

slide-10
SLIDE 10

[“di”, “ie”, “e_”, “_b”, “be”, “ed”…]

Berlin Expert Days 2015 | Dennis Kluge | Slide 10

slide-11
SLIDE 11
slide-12
SLIDE 12

GO(LANG)

  • 2009 - erschienen … 2012 - Version 1.0
  • kompiliert, stark typisiert, imperativ, strukturiert
  • optimiert für Nebenläufigkeit
  • Garbage Collection
  • C angelehnte Syntax

Berlin Expert Days 2015 | Dennis Kluge | Slide 12

slide-13
SLIDE 13

STRINGS

  • Go source code is always UTF-8.
  • A string holds arbitrary bytes.
  • A string literal, absent byte-level escapes, always holds valid UTF-8

sequences.

  • Those sequences represent Unicode code points, called runes.
  • No guarantee is made in Go that characters in strings are normalized.
  • https://blog.golang.org/strings

Berlin Expert Days 2015 | Dennis Kluge | Slide 13

slide-14
SLIDE 14

Berlin Expert Days 2015 | Dennis Kluge | Slide 14

slide-15
SLIDE 15

Boolean, Numeric, 
 String, Array, Slice, 
 Map, Interface, Map, Channel

Berlin Expert Days 2015 | Dennis Kluge | Slide 15

slide-16
SLIDE 16

Berlin Expert Days 2015 | Dennis Kluge | Slide 16

slide-17
SLIDE 17

Type Inference

Berlin Expert Days 2015 | Dennis Kluge | Slide 16

slide-18
SLIDE 18

Berlin Expert Days 2015 | Dennis Kluge | Slide 17

slide-19
SLIDE 19

Berlin Expert Days 2015 | Dennis Kluge | Slide 18

slide-20
SLIDE 20

Package

Berlin Expert Days 2015 | Dennis Kluge | Slide 19

slide-21
SLIDE 21

Berlin Expert Days 2015 | Dennis Kluge | Slide 20

slide-22
SLIDE 22

Großbuchstabe deklariert public

Berlin Expert Days 2015 | Dennis Kluge | Slide 21

slide-23
SLIDE 23

Berlin Expert Days 2015 | Dennis Kluge | Slide 22

slide-24
SLIDE 24

C-Style

Berlin Expert Days 2015 | Dennis Kluge | Slide 23

slide-25
SLIDE 25

Berlin Expert Days 2015 | Dennis Kluge | Slide 24

slide-26
SLIDE 26

Berlin Expert Days 2015 | Dennis Kluge | Slide 25

slide-27
SLIDE 27
  • mathiasbynens.be/demo/url-regex
  • stackoverflow.com/questions/161738/what-is-the-best-regular-

expression-to-check-if-a-string-is-a-valid-url

MATCHING URLS

Berlin Expert Days 2015 | Dennis Kluge | Slide 26

slide-28
SLIDE 28

Berlin Expert Days 2015 | Dennis Kluge | Slide 27

slide-29
SLIDE 29

Berlin Expert Days 2015 | Dennis Kluge | Slide 28

slide-30
SLIDE 30

Berlin Expert Days 2015 | Dennis Kluge | Slide 29

slide-31
SLIDE 31

Berlin Expert Days 2015 | Dennis Kluge | Slide 30

slide-32
SLIDE 32

Berlin Expert Days 2015 | Dennis Kluge | Slide 31

slide-33
SLIDE 33

github.com/horstmumpitz/bedcon2015

Berlin Expert Days 2015 | Dennis Kluge | Slide 32

slide-34
SLIDE 34
slide-35
SLIDE 35

Bigram Tweet 1 Tweet 2 Tweet 3 di 1 2 ie 5 4 e_ 3 9 …

BAG OF BIGRAMS

Berlin Expert Days 2015 | Dennis Kluge | Slide 34

slide-36
SLIDE 36

Berlin Expert Days 2015 | Dennis Kluge | Slide 35

slide-37
SLIDE 37

–dennis.kluge@charite.de
 – @HorstMumpitz

D a n k e