Learning text representations from character-level data
Grzegorz Chrupa la
Department of Communication and Information Sciences Tilburg University
CLIN 2013
Chrupa la (UvT) Text representations CLIN 2013 1 / 19
Learning text representations from character-level data Grzegorz - - PowerPoint PPT Presentation
Learning text representations from character-level data Grzegorz Chrupa la Department of Communication and Information Sciences Tilburg University CLIN 2013 Chrupa la (UvT) Text representations CLIN 2013 1 / 19 Text representations
Department of Communication and Information Sciences Tilburg University
Chrupa la (UvT) Text representations CLIN 2013 1 / 19
◮ Brown or HMM word classes ◮ Collobert and Weston distributed
◮ LDA-type soft classes
◮ Chunking and named entity recognition ◮ Parsing ◮ Semantic relation labeling Chrupa la (UvT) Text representations CLIN 2013 2 / 19
Chrupa la (UvT) Text representations CLIN 2013 3 / 19
Chrupa la (UvT) Text representations CLIN 2013 4 / 19
◮ English ◮ Code block (Java, Python...) ◮ Inline code ◮ ... Chrupa la (UvT) Text representations CLIN 2013 5 / 19
Chrupa la (UvT) Text representations CLIN 2013 6 / 19
Hidden units Input/Output units t-1 t t+1
Chrupa la (UvT) Text representations CLIN 2013 7 / 19
Chrupa la (UvT) Text representations CLIN 2013 8 / 19
writing·a·.NET·applicati ·any·links·with·informati d·to·test·a·IP·verificati enerate·each·IP·combinati ·files.·I·have·presentati
$n1.’.’.$n2.’.’.$n3++.’.’ t;’;¶········echo·$n1.’.’ ·····echo·$n1.’.’.$n2.’.’ ·····echo·$n1.’.’.$n2.’.’ p":·{"last_share":·130738 c":·{"last_share":·130744 p":·{"last_share":·130744 :·{"last_share":·13073896 :·{"last_share":·13074418 able·has·integer·values·a 5.·For·all·these·values·I lots·of·private·methods·a me·across·any·resources·e an·add·more·connections·s
Chrupa la (UvT) Text representations CLIN 2013 9 / 19
I·only·make·event·glds. so,·on·the·cell·proceedclicks·like·completed,·with·color? ····st·potention, ‘column’]HeaderException=ID·=·new·Put="True"·MetadataTemplate, ·grwTrowerRow="SELECTEMBRow"·on? All·clearBeanLockCollection="#7293df3335b-E9"·/> ············<Image:DataKey="BackgroundCollectionC2UTID"·
Chrupa la (UvT) Text representations CLIN 2013 10 / 19
◮ Run on labeled train and test data ◮ Record hidden unit activations at each
◮ Use as extra features for CRF Chrupa la (UvT) Text representations CLIN 2013 11 / 19
Chrupa la (UvT) Text representations CLIN 2013 12 / 19
Chrupa la (UvT) Text representations CLIN 2013 13 / 19
◮ For each of 10 most active units ⋆ Is the activation > 0.5? Chrupa la (UvT) Text representations CLIN 2013 14 / 19
◮ Train: 1.2 – 10 million characters ◮ Test: 2 million characters
◮ 465 million characters Chrupa la (UvT) Text representations CLIN 2013 15 / 19
4 6 8 10 63 64 65 66 67 68 69 Size of labeled training set in millions of characters F1 Chrupa la (UvT) Text representations CLIN 2013 16 / 19
4 6 8 10 63 64 65 66 67 68 69 Size of labeled training set in millions of characters F1
Baseline
Chrupa la (UvT) Text representations CLIN 2013 17 / 19
Chrupa la (UvT) Text representations CLIN 2013 18 / 19
Chrupa la (UvT) Text representations CLIN 2013 19 / 19