Semantic Tagging Using Topic Models Exploiting Wikipedia Category - PowerPoint PPT Presentation

Jan 21, 2023 •34 likes •184 views

Semantic Tagging Using Topic Models Exploiting Wikipedia Category Network Nitesh Prakash, Duncan Rule, Boh Young Suh Introduction Goal: tag web articles with most probable Wikipedia categories - What is the article about in terms of

Semantic Tagging Using Topic Models Exploiting Wikipedia Category Network Nitesh Prakash, Duncan Rule, Boh Young Suh
Introduction Goal: tag web articles with most probable Wikipedia categories - What is the article “about” in terms of categories? - Helpful for information access and retrieval
Model Overview (sOntoLDA) Modify LDA to suit problem’s needs - Pre-define topics as Wiki categories - Use prior knowledge to improve topic-word distribution - Wikipedia articles labeled with categories - Represent with � matrix LDA sOntoLDA � ~ Dir( � ) � ~ Dir( � x � )
Building Prior Matrix ( � ) How do we represent prior word-topic knowledge? - Start with tf-idf matrix - Each “document” is the set of Wiki articles tagged with a given category - Add subcategories down to a specific level ℓ
Inference using Gibbs Sampling Now that we have a generative LDA model and the � priors we need to ● reverse the process to infer from the observed documents: Denominator cannot be computed with C n terms where n is number of ● words in vocabulary. Collapsed Gibbs Sampling which uses a Markov Chain Monte Carlo to ● converge to a posterior distribution over categories c, conditioned on the observed words w, and hyperparameters � and �
Inference using Gibbs Sampling Probability of a category ● given a document Probability of a word given a ● category
Health Tagging Example Health Health Care Sciences Structure of and relationships ● between Wikipedia categories as represented by SKOS properties. Self Care Dentistry 0.0403 0.0478 Sub Categories and Super ● Categories Personal Dentistry Consider super-categories in Hygiene ● Hygiene Branches Products addition to exact match 0.0302 Categories assigned to article on Chiropractic ● Oral Hygiene Treatment “tooth brushing” and the related 0.1533 Techniques 0.0227 category hierarchy Tooth Brushing
Experiments 1. How well does the model predict the categories of a collection of the Wikipedia articles? 2. Assign Wikipedia tags to Reuters News articles and compare top-k topics
Preprocessing Final Topic Graph 1,353 categories ● 30,300 articles ● Vocabulary size 99,665 ●
Evaluation metric Precision@k and Mean Average Precision (MAP)
Tagging wikipedia article results
Real-world document set Evaluation on Reuters news (2,914 articles) Applied the “Hierarchical match” method used for the Wikipedia dataset Removed words not defined in Prior Matrix ( � )
Example of topic and word distribution
Conclusions Utilizing prior knowledge from Wikipedia’s hierarchical ontology can be ● successfully used for semantic tagging for documents Future work - Expand to other topics - Explore richer topic models - Incorporate hierarchical structure of categories
Questions?

Recommend

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

Topic 1 Topic 1 ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic 4 Topic 4 Topic 5 Topic 5 Topic 6 Topic 6 Using CDBG for ConnectHome Topic 7 Topic 7 Topic 8 Topic 8 1 Using CDBG

487 views • 16 slides

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

TOPIC TOPIC TOPIC TOPIC TOPIC B TOPIC C TOPIC E TOPIC F A D G H Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC TOPIC TOPIC B TOPIC C TOPIC E TOPIC F A A D G H Breakfast and

618 views • 22 slides

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

POS Tagging Definition Tagsets Automatic POS Tagging Bigram tagging MLE POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS Tagging Def. Part of Speech Tagging Definition Tagsets Automatic POS

659 views • 17 slides

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and Language Processing, chapter 8 2. Foundations of Statistical Natural Language Processing, chapter 10 1 Review Tagging (part-of-speech tagging)

671 views • 38 slides

Semantic Wikipedia [[enhances::Wikipedia]] Wikipedia today A free online encyclopdia

Max Vlkel, Markus Krtzsch, Denny Vrandecic, Heiko Haller, Rudi Studer AIFB and FZI Karlsruhe, Germany @WWW2006, 26.05.2006 Semantic Wikipedia [[enhances::Wikipedia]] Wikipedia today A free online encyclopdia 16th most accessed

841 views • 46 slides

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual Limitations Perceptual Limitations exploiting humans perceptual limitations exploiting humans perceptual limitations Humans Humans

384 views • 4 slides

Tagging Scientific Publications Using Wikipedia and NLP Tools Comparison on the ArXiv dataset

Tagging Scientific Publications Using Wikipedia and NLP Tools Comparison on the ArXiv dataset Micha opuszyski , ukasz Bolikowski Agenda What? Why? How? Motivation, dataset, details of the two employed tagging methods, first based

260 views • 22 slides

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and Language Processing , chapter 8 2. Foundations of Statistical Natural Language Processing , chapter 10 NLP-Berlin Chen 1 Review Tagging

1k views • 54 slides

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

1 IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence labeling Lecture 7, 28 Sept Today 3 Tagged text and tag sets Tagging as sequence labeling HMM-tagging Discriminative tagging

709 views • 51 slides

Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier,

Tagging in a nutshell Tagging in a nutshell Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier, EPFL Vincent Claveau Vocabulary tagging, French: etiquetage IRISA - CNRS tag, Fr.

441 views • 7 slides

Traffic UTM Tagging AdWords WebMaster Tools UTM TAGGING Where does my traffic come from? UTM

Traffic UTM Tagging AdWords WebMaster Tools UTM TAGGING Where does my traffic come from? UTM TAGGING www.website.com/?utm_medium=email&utm_source=newsletter&utm_campaign=12-5-2015 utm_medium utm_content utm_source

608 views • 6 slides

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

Arabic POS Tagging Arabic + POS Tagging Data + Experiments Segmentation POS Tagging Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana University 1 / 13 The Structure of Arabic Words Arabic

375 views • 23 slides

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Automatic POS tagging: the problem Methods for tagging Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University of Edinburgh 23 October 2014 Informatics 2A: Lecture 16 Part of Speech Tagging 1

663 views • 26 slides

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

Automatic POS Tagging HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University of Edinburgh 21 October 2011 Informatics 2A: Lecture 15 Part of Speech Tagging 1 Automatic

1.82k views • 53 slides

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

NLP Programming Tutorial 5 POS Tagging with HMMs NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig Nara Institute of Science and Technology (NAIST) 1 NLP Programming Tutorial 5 POS Tagging

533 views • 24 slides

Empirical Methods in Natural Language Processing Lecture 8 Tagging (III): Maximum Entropy Models

Empirical Methods in Natural Language Processing Lecture 8 Tagging (III): Maximum Entropy Models Philipp Koehn 31 January 2008 PK EMNLP 31 January 2008 1 POS tagging tools Three commonly used, freely available tools for tagging: TnT

402 views • 9 slides

USING WIKIPEDIA TO ENHANCE LEARNING BY CASSANDRA DELLACORTE Wikipedian in Residence, Special

USING WIKIPEDIA TO ENHANCE LEARNING BY CASSANDRA DELLACORTE Wikipedian in Residence, Special Collections and Archives, DePaul University Library Goal: Understand the value of using and engaging with Wikipedia, and know how to begin. What

382 views • 26 slides

1 This module reviews the standard format you should use

1 This module reviews the standard format you should use to create your resume. 2 This module is part of a series of modules dedicated to

137 views • 10 slides

Lifestyles & Experiences Travelling... it leaves you speechless, then turns you into a

Lifestyles & Experiences Travelling... it leaves you speechless, then turns you into a storyteller... 1 Our Vision Our experience in managing and marketing properties has given us invaluable insights into what independent travellers

379 views • 19 slides

Introducing UPRISE! October 2019 UPDATE 1 Be Catholic Strong! We are able to withstand great

Introducing UPRISE! October 2019 UPDATE 1 Be Catholic Strong! We are able to withstand great forces and pressures against our faith. We are given the power through the Holy Spirit to move forward heavy or demanding tasks. 2 Be Catholic

494 views • 38 slides

Mark Foxwell AstraZeneca UK GPP for Clinical Trials What is Good Programming Practice? A

Industry Standard Good Programming Practice for Clinical Trials (using SAS) Mark Foxwell AstraZeneca UK GPP for Clinical Trials What is Good Programming Practice? A collaborative approach to GPP (Wiki) What should we document?

499 views • 27 slides

Analysis Manuel Len Hoyos Overview What is Time Series Data? Index Prices: Crude Oil

Time Series Analysis Manuel Len Hoyos Overview What is Time Series Data? Index Prices: Crude Oil Gold Bitcoin What is Time Series Analysis? Uses Forecasting Time Series Data A collection of observations of a

738 views • 17 slides

Personal Wiki JobSearch LearningLabs Preliminary Questions: Website Development What is your

Personal Wiki JobSearch LearningLabs Preliminary Questions: Website Development What is your objective in setting this up? JobSearch LearningLabs Preliminary Questions: Website Development What is your objective in setting this up?

460 views • 32 slides

One Laptop Per Child Exploring, Sharing, Reflecting, Learning Presentation at: Mona

One Laptop Per Child Exploring, Sharing, Reflecting, Learning Presentation at: Mona School of Business The University of the West Indies Sameer Verma, Ph.D. Mona, Jamaica sverma@sfsu.edu Outline Introduction Education Laptop

560 views • 36 slides