Introduction to Machine Learning NPFL 054 - - PowerPoint PPT Presentation

introduction to machine learning
SMART_READER_LITE
LIVE PREVIEW

Introduction to Machine Learning NPFL 054 - - PowerPoint PPT Presentation

Introduction to Machine Learning NPFL 054 http://ufal.mff.cuni.cz/course/npfl054 Barbora Hladk Martin Holub {Hladka | Holub} @ufal.mff.cuni.cz Charles University in Prague, Faculty of Mathematics and Physics, Institute of Formal and


slide-1
SLIDE 1

Introduction to Machine Learning

NPFL 054

http://ufal.mff.cuni.cz/course/npfl054 Barbora Hladká Martin Holub {Hladka | Holub} @ufal.mff.cuni.cz

Charles University in Prague, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics

NPFL054, 2015 Hladká & Holub & Lukšová Demo 1, page 1/12

slide-2
SLIDE 2

Demo 1 Verb Patterns Classification

Purpose of the demo task = to show several things related to gold standard data for a supervised machine learning task, especially

  • Manual annotation and basic data analysis
  • Gold Standard data distribution
  • Inter-annotator agreement
  • Confusion matrices
  • Error analysis

NPFL054, 2015 Hladká & Holub & Lukšová Demo 1, page 2/12

slide-3
SLIDE 3

Demo 1 Verb Patterns Classification Annotation experiment 2015

NPFL054, 2015 Hladká & Holub & Lukšová Demo 1, page 3/12

slide-4
SLIDE 4

Gold Standard – data distributions

u x 1 4 7

Cry GS histogram

20 40 60 80 100 120 140 u 1 2 3 4

Enlarge GS histogram

50 100 150 200 250

NPFL054, 2015 Hladká & Holub & Lukšová Demo 1, page 4/12

slide-5
SLIDE 5

Manual annotation

  • Annotated data
  • 6 x 10 sentences with CRY
  • 6 x 10 sentences with ENLARGE
  • 4 groups A, B, C, D
  • the same data set annotated by each group
  • We analyse
  • which group is closer to the Gold Standard
  • the inter-annotator agreement between groups

NPFL054, 2015 Hladká & Holub & Lukšová Demo 1, page 5/12

slide-6
SLIDE 6

A, B, C, D distributions - CRY

u x 1 4 7

Cry A

10 30 u x 1 4 7

Cry B

10 20 30 u x 1 4 7

Cry C

10 20 30 u x 1 4 7

Cry D

10 25

NPFL054, 2015 Hladká & Holub & Lukšová Demo 1, page 6/12

slide-7
SLIDE 7

A, B, C, D distributions - ENLARGE

u 1 2 3 4

Enlarge A

10 20 30 u 1 2 3 4

Enlarge B

10 20 30 u 1 2 3 4

Enlarge C

10 20 u 1 2 3 4

Enlarge D

10 30

NPFL054, 2015 Hladká & Holub & Lukšová Demo 1, page 7/12

slide-8
SLIDE 8

Confusion matrices - CRY

group agreement (%) disagreement (%) A 41 (68,3) 19 (31,7) B 40 (66,7) 20 (33,3) C 40 (66,7) 20 (33,3) D 45 (75,0) 15 (25,0)

NPFL054, 2015 Hladká & Holub & Lukšová Demo 1, page 8/12

slide-9
SLIDE 9

Confusion matrices - ENLARGE

group agreement (%) disagreement (%) A 38 (63,3) 22 (36,7) B 28 (46,7) 32 (53,3) C 28 (46,7) 32 (53,3) D 36 (60,0) 24 (40,0)

NPFL054, 2015 Hladká & Holub & Lukšová Demo 1, page 9/12

slide-10
SLIDE 10

Inter-annotator agreement (IAA)

groups verb Cohen’s Kappa Fleiss’s Kappa A-B cry 0.355 A-C cry 0.276 A-D cry 0.406 B-C cry 0.366 B-D cry 0.407 C-D cry 0.327 A-B-C-D cry – 0.353 A-B enlarge 0.306 A-C enlarge 0.413 A-D enlarge 0.296 B-C enlarge 0.222 B-D enlarge 0.324 C-D enlarge 0.365 A-B-C-D enlarge – 0.319

NPFL054, 2015 Hladká & Holub & Lukšová Demo 1, page 10/12

slide-11
SLIDE 11

A + B + C + D vs. GS – CRY

  • Number of agreements: 171 (71.3 %)
  • Number of disagreements: 69 (28.7 %)

NPFL054, 2015 Hladká & Holub & Lukšová Demo 1, page 11/12

slide-12
SLIDE 12

A + B + C + D vs. GS – ENLARGE

  • Number of agreements: 138 (57.5 %)
  • Number of disagreements: 102 (42.5 %)

NPFL054, 2015 Hladká & Holub & Lukšová Demo 1, page 12/12