PROBABILISTIC MODELS FOR STRUCTURED DATA Course Project - - PowerPoint PPT Presentation

probabilistic models for structured data
SMART_READER_LITE
LIVE PREVIEW

PROBABILISTIC MODELS FOR STRUCTURED DATA Course Project - - PowerPoint PPT Presentation

PROBABILISTIC MODELS FOR STRUCTURED DATA Course Project Instructor: Yizhou Sun yzsun@cs.ucla.edu January 14, 2020 Overview Goal: design a probabilistic graphical model to solve real-world problems, and write a report that is potentially


slide-1
SLIDE 1

PROBABILISTIC MODELS FOR STRUCTURED DATA

Instructor: Yizhou Sun

yzsun@cs.ucla.edu January 14, 2020

Course Project

slide-2
SLIDE 2

Overview

  • Goal: design a probabilistic graphical model to

solve real-world problems, and write a report that is potentially submitted to some venue for publication

  • Teamwork
  • 3-4 people per group
  • Milestones
  • Team formation due date: Week 2 (1pt as

participation)

  • Proposal due date: Week 5 (5pt)
  • Presentation due date: 3/12/2020 in class (20pt)
  • Final report due date: 3/13/2020 (15pt)
  • What to submit: project report and code

2

slide-3
SLIDE 3

Report Guideline

  • Format: no more than 8-page, ACM SIG template:

https://www.acm.org/publications/proceedings- template-16dec2016:

  • 1. Title with group information (group # and name,

group member names)

  • 2. Abstract
  • 3. Introduction of the overall goal and background
  • 4. Problem definition and formalization
  • 5. Methods description (detailed steps)
  • 6. Experiments design and Evaluation
  • 7. Related work
  • 8. Conclusion
  • 9. References

3

slide-4
SLIDE 4

Breakdown Points

  • 1. Is the problem

formalization reasonable?

  • 2. Is the solution solid

and reasonable?

  • 3. Is there comparison

with alternative approaches with reasonable evaluation?

  • 4. Report writing

Quality

4

slide-5
SLIDE 5

Problem 1: Paper Classification in Directed Citation Network

  • Cora Dataset:
  • http://www.cs.umass.edu/∼mccallum/code-

data.html

  • Cora.zip
  • Label: Each paper is associated with a research

topic

  • There is a hierarchy structure in the dataset,

please use the top hierarchy as labels

  • Feature: Each paper has words extracted from

title

5

slide-6
SLIDE 6
  • Task:
  • Design a probabilistic graphical model to

leverage the citation links to classify papers into research topics

  • Questions to address:
  • How to take the asymmetry in citation relation into the

potential function design?

  • Design asymmetry potential function and implement it correctly
  • Will the consideration of asymmetry improve the

classification accuracy?

  • Compare with the solution that simply ignores the asymmetry

6

slide-7
SLIDE 7
  • Evaluation:
  • Hide p% labels as test, use the remaining as

training

  • Vary p to see its impact to the classification accuracy
  • Evaluation metric for multi-label classification

7

slide-8
SLIDE 8

Problem 2: Node Classification in Heterogeneous Bibliographic Network

  • Dataset
  • four_area.zip
  • Label: authors and venues are associated with
  • ne of the four research areas, i.e., DB, DM,

ML, IR

  • Label information can be found on

DBLP_four_area.zip

  • Feature: Only Papers are associated with text

information

8

slide-9
SLIDE 9
  • Task:
  • Design a probabilistic graphical model to classify

all the objects into four category in the network

  • Questions to address:
  • How to leverage different types of links in the network?
  • Design different types of potential functions for different types of

links by assuming different parameters

  • Will the consideration of type information for links

improve the performance?

  • Compare the solution that treats all the links equally

9

slide-10
SLIDE 10
  • Evaluation:
  • Hide p% labels as test, use the remaining as

training

  • Vary p to see its impact to the classification accuracy
  • Evaluation metric for multi-label classification
  • Evaluation when multiple types of nodes exist

10

slide-11
SLIDE 11

Project 3: Polarity Detection for Twitter Users

  • Dataset: Crawl Twitter Users following Political

figures, their following, retweet, and reply behaviors, as well as their tweets

  • Task: Design a probabilistic graphical model to

classify all the users into two polarities

11

slide-12
SLIDE 12

Project 4: Knowledge Completion for Knowledge Graphs via Higher-Order Dependency Modeling

  • Datasets: Knowledge Graphs, such as YAGO,

FreeBase, and NELL

  • Task: Design a probabilistic graphical model to

that can leverage higher-order dependency to solve knowledge graph completion tasks

  • i.e., < h,r,?>

12

slide-13
SLIDE 13

Project 5: Construct CS Taxonomy from Wiki

  • Dataset: Wikipedia
  • Task: construct taxonomy for terms related to

computer science

  • E.g., root node: “computer science”

13

https://www.researchgate.net/figure/Computer-Science-Taxonomy_fig1_260318181

slide-14
SLIDE 14

Project 6: NER for Wiki Pages in CS

  • Dataset: Wikipedia
  • Task: Conduct NER task for text of wiki pages
  • Categories: concept (e.g., machine learning,

deep learning); algorithm (e.g., CNN); application (e.g., self driving car); dataset (e.g., ImageNet), etc.

14