SLIDE 12 23
Text Mining Part II Text Mining Part II
- Text mining = knowledge discovery (in text)?
- Task:
Discover or derive new information from large document Discover or derive new information from large document collections
– find patterns across datasets/documents – separate signal from noise
David B vid Brown, Univer ersity for for Indu dustry ry vi visi sits t the O OU David vid Brown, Uni Univer ersity for I for Indu dustry ry vi visi sits t the O OU David vid Br Brown, n, Un Univer iversity ty fo for In r Indu dustry ry vi visi sits the OU David vid Brown, Uni Univer ersity for I for Indu dustry ry David vid Brown Un Univer iversity ty fo for Industry try
g – statistical (and linguistic) approach
C t t ti
John Dominque Wed, 15 Oct 1997 David Brown, the Chairman of the University for Industry Design and Implementation Advisory Group and Chairman of Motorola, visited the OU as part of a fact finding exercise, prior to drafting his initial 100 Days Report to HM
- Government. David was accompanied by Jeanette
Pugh, Josh Hillman and Nick Pearce. John Dominque Wed, 15 Oct 1997 David Brown, the Chairman of the University for Industry Design and Implementation Advisory Group and Chairman of Motorola, visited the OU as part of a fact finding exercise, prior to drafting his initial 100 Days Report to HM
- Government. David was accompanied by Jeanette
Pugh, Josh Hillman and Nick Pearce. vi visi sits the the OU OU John Dominque Wed, 15 Oct 1997 David Brown, the Chairman of the University for Industry Design and Implementation Advisory Group and Chairman of Motorola, visited the OU as part of a fact finding exercise, prior to drafting his initial 100 Days Report to HM
- Government. David was accompanied by Jeanette
Pugh, Josh Hillman and Nick Pearce. vi visi sits t the O OU John Dominque Wed, 15 Oct 1997 David Brown, the Chairman of the University for Industry Design and Implementation Advisory Group and Chairman of Motorola, visited the OU as part of a fact finding exercise, prior to drafting his initial 100 Days Report to HM
- Government. David was accompanied by Jeanette
Pugh Josh Hillman and Nick Pearce Da David Br Brown, , Uni Univer ersity ty fo for In Indu dustry ry vi visi sits t the O OU John Dominque Wed, 15 Oct 1997 David Brown, the Chairman of the University for Industry Design and Implementation Advisory Group and Chairman of Motorola, visited the OU as part of a fact finding exercise, prior to drafting his initial 100 Days Report to HM
- Government. David was accompanied by Jeanette
– Concept extraction – Ontology construction – TOC construction – Clustering
Pugh, Josh Hillman and Nick Pearce. p y Pugh, Josh Hillman and Nick Pearce.
K l d
– Text categorization – Subtechniques: information extraction, text analysis
Knowledge
TDT4215 - Introduction
24
Text Mining Example 2 Text Mining Example 2
D t ll ti f X
- Document collection from X
- What is the content?
P i t t
Helsestasjon, helseorganisasjon, journalsystemet, kvalitetsrådgiverprogrammet, miljørettet, Journalopplysninger, sped, helsekortet, skolehelsetenesta, journalforskriften, passord, k ifik j
- Terms used together in text
– Journalforskriften: kravspesifikasjon D t til ik ki b i i tj l Mental retardasjon: Datatilsyn, riksarkivar, oppbevaring, pasientjournaler, Retting, journalopplysninger, sletting, Personregisterloven, journal – Mental retardasjon: Syndrom, cerebral, alkoholforbruk, mor, hørsel, ben, Misdannelse, leveår, forekomst
TDT4215 - Introduction