Sentence Level Text Analysis Vojtch Kov Natural Language Processing - - PowerPoint PPT Presentation

sentence level text analysis
SMART_READER_LITE
LIVE PREVIEW

Sentence Level Text Analysis Vojtch Kov Natural Language Processing - - PowerPoint PPT Presentation

Sentence Level Text Analysis Vojtch Kov Natural Language Processing Centre Faculty of Informatics, Masaryk University Botanick 68a, 602 00 Brno xkovar3@fi.muni.cz Workshop of the Natural Language Processing Centre 28 May 2013


slide-1
SLIDE 1

Sentence Level Text Analysis

Vojtěch Kovář

Natural Language Processing Centre Faculty of Informatics, Masaryk University Botanická 68a, 602 00 Brno xkovar3@fi.muni.cz

Workshop of the Natural Language Processing Centre 28 May 2013

Vojtěch Kovář NLP Centre FI MU Brno Sentence Level Text Analysis

slide-2
SLIDE 2

Simon spoke about sex with Britney Spears

Vojtěch Kovář NLP Centre FI MU Brno Sentence Level Text Analysis

slide-3
SLIDE 3

kdo/co katastr nemovitostí přísudek Zkolaboval kde v parcích kdo/co lidé přísudek musejí přespávat

Zkolaboval katastr nem

  • vitostí , lidé m

usejí přespávatv parcích

Zkolaboval katastr nemovitostí lidé musejí přespávatv parcích

zdroj: www.infobaden.cz

Vojtěch Kovář NLP Centre FI MU Brno Sentence Level Text Analysis

slide-4
SLIDE 4

Sentence level (syntactic) analysis

Natural language syntax describes relationships among words Automatic syntactic analysis revealing inter-word relationships on various levels detection of noun (prepositional, verb, ...) phrases, clauses finding relationships (dependencies) among the units | Simon | spoke | about sex | with Britney Spears | | Simon | spoke | about sex with Britney Spears |

Vojtěch Kovář NLP Centre FI MU Brno Sentence Level Text Analysis

slide-5
SLIDE 5

Syntactic trees

Simon spoke about sex with Britney Spears <SENTENCE> <VP> <PP> <NP> <PP> <NP> <N> <V> <PREP> <N> <PREP> <N> <N>

Vojtěch Kovář NLP Centre FI MU Brno Sentence Level Text Analysis

slide-6
SLIDE 6

Syntactic trees

Simon spoke about sex with Britney Spears <SENTENCE> <VP> <PP> <PP> <NP> <N> <V> <PREP> <N> <PREP> <N> <N>

Vojtěch Kovář NLP Centre FI MU Brno Sentence Level Text Analysis

slide-7
SLIDE 7

Syntactic trees

Simonsubject spoke aboutpp sexprep-object withpp Britneyprep-object Spearsattr

Vojtěch Kovář NLP Centre FI MU Brno Sentence Level Text Analysis

slide-8
SLIDE 8

Syntactic trees

Simonsubject spoke aboutpp sexprep-object withpp Britneyprep-object Spearsattr

Vojtěch Kovář NLP Centre FI MU Brno Sentence Level Text Analysis

slide-9
SLIDE 9

Why are we doing this?

Syntactic units are carriers of meaning “in the city” meaning of “in”, “the” is unclear, complicated meaning of “in the city” is simply where Words are sometimes not enough red brick house vs. brick house red vs. red house brick Honey, give me love vs. Love, give me honey Starting point for intelligent natural language applications extraction of facts & question answering logical analysis punctuation detection & grammar checking natural text generation authorship detection machine translation

Vojtěch Kovář NLP Centre FI MU Brno Sentence Level Text Analysis

slide-10
SLIDE 10

Example: Extraction of facts

kdo/co katastr nemovitostí přísudek Zkolaboval kde v parcích kdo/co lidé přísudek musejí přespávat

Zkolaboval katastr nem

  • vitostí , lidé m

usejí přespávatv parcích

Zkolaboval katastr nemovitostí lidé musejí přespávatv parcích

zdroj: www.infobaden.cz

text syntactic analysis clauses, phrases phrase classification facts

Vojtěch Kovář NLP Centre FI MU Brno Sentence Level Text Analysis

slide-11
SLIDE 11

Example: Logical analysis

Žádný mobilní agent není statický .

λw1λt2[Not,[Truew1t2,λw3λt4(∃i5)([statickýw3t4 i5]

[[mobilní,agent]w3t4,i5])]]...π

text syntactic analysis trees tree conversion formulae Žádný mobilní agent není statický <SENTENCE> <VP> <NP> <ADJ> <ADJ> <N> <V> <ADJ>

¬∃x(mobilni(x) ∧ agent(x) ∧ staticky(x))

Vojtěch Kovář NLP Centre FI MU Brno Sentence Level Text Analysis

slide-12
SLIDE 12

Example: Grammar checking

Let’s eat grandma! syntactic analysis detection of non-probable constructions → grandma is not a usual object of eating → correction suggestion Let’s eat, grandma! life saved :) Similarly with other grammar phenomena “This is worth try” → “This is worth trying”

Vojtěch Kovář NLP Centre FI MU Brno Sentence Level Text Analysis

slide-13
SLIDE 13

How to analyze natural language syntax?

Prerequisites word level analysis (part of speech, gender, number) named entity recognition lexical semantic information (e.g. “pregnant” goes with women only) Named entity recognition determine that e.g. “prof. Václav Šplíchal” is a person can be viewed as a sub-task of syntactic analysis

Vojtěch Kovář NLP Centre FI MU Brno Sentence Level Text Analysis

slide-14
SLIDE 14

How to analyze natural language syntax?

Statistical methods people annotate corpus statistic methods learn rules from the corpus universal across languages (to some extent) annotation is expensive hard to customize for different applications data are usually not big enough Rule-based methods specialists develop a set of rules (“grammar”) not universal, depends on specialists grammar can become uneasy to maintain easy to customize for different applications Hybrids

Vojtěch Kovář NLP Centre FI MU Brno Sentence Level Text Analysis

slide-15
SLIDE 15

Syntactic analysers in the NLP Centre

Synt C++, fast (0.07 s/sentence) based on a large meta-grammar SET Python, slower but easily adaptable based on a set of patterns Both rule-based backbone with statistical extensions grammars for Czech, English and Slovak accuracy 85 – 90 % on journal texts Word Sketches very fast shallow syntax for large corpora 31 languages See you in demo :)

Vojtěch Kovář NLP Centre FI MU Brno Sentence Level Text Analysis