Exploring Social Tagging Graph for Web Object Classification Zhijun - PowerPoint PPT Presentation

Data e Web Mining – AA 2009/2010 Exploring Social Tagging Graph for Web Object Classification Zhijun Yin, Rui Li, Qiaozhu Mei, Jiawei Han Giulia Mialich – 825102 Luca Rossi – 825038

Stats • recently announced 300 million users (sept,2009) • Social Media has overtaken porn as the #1 activity on the Web • % of companies using as a primary tool to find employees….80% • The #2 largest search engine in the world is • There are over 200,000,000 Blogs and 54% post content or tweet daily • 25% of Americans in the past month said they watched a short video…on their phone More than 1.5 million pieces of content (web links, news stories, blog posts, notes, photos, etc.) are shared on Facebook…daily. Giulia Mialich - Luca Rossi

Web Objects Products Video Photo Research Papers 10^3 - 10^9 Giulia Mialich - Luca Rossi

Web Objects Need to classify web objects into semantic categories Discover interesting Index and organize patterns from web objects web objects efficiently Browse and search of web objects conveniently Giulia Mialich - Luca Rossi

Web Objects Classification Giulia Mialich - Luca Rossi

Web Objects Classification Challenging task for the specific characteristics of the data. • LACK OF FEATURES Limited text description & difficulty on extracting content features of video/image Giulia Mialich - Luca Rossi

Web Objects Classification Challenging task for the specific characteristics of the data. • LACK OF FEATURES Limited text description & difficulty on extracting content features of video/image • LACK OF INTERCONNECTIONS “Michael Jordan” Giulia Mialich - Luca Rossi

Web Objects Classification Challenging task for the specific characteristics of the data. • LACK OF FEATURES Limited text description & difficulty on extracting content features of video/image • LACK OF INTERCONNECTIONS “Michael Jordan” • LACK OF LABELS Difficulty to create a large training set Giulia Mialich - Luca Rossi

Web Objects Classification Limited text description Rich semantic feature space Social tags Isolated settings of web objects Overcome the difficulties of web object classification Labeled examples in some domains Heterogeneous objects on Web ar tagged by users, with keywords freely chosen from their own vocabulary Giulia Mialich - Luca Rossi

Social Tags LACK OF FEATURES Users provides enriched semantic features for web object classification Giulia Mialich - Luca Rossi

Social Tags LACK OF INTERCONNECTIONS New link structure of web objects Giulia Mialich - Luca Rossi

Social Tags LACK OF LABELS Heterogenous types of web objects are connected through common tags Giulia Mialich - Luca Rossi

Related work This is the first work to explore social tag data for web object classification. Investigated for a long time: - web page classification - multimedia classification WEB PAGE CLASSIFICATION textual feature based MULTIMEDIA OBJECT CLASSIFICATION hyperlink text features html & metadata contextual information query log Giulia Mialich - Luca Rossi

Related work Authors propose a general theoretic framework for explicitly modeling tagging behaviors and web object classification problem. Social Tag can benefit: web search information retrieval semantic web web page clustering user interest mining Giulia Mialich - Luca Rossi

[2007] S. Bao, G.-R. Xue, X. Wu, Y. Yu, B. Fei, and Z. Su. Optimizing web search using social annotations. In WWW [2007] Bao et al. Observe that the social annotation can benefit web search in two aspects: 1.Annotations are good sumaries of corresponding web pages [Amazon’s homepage: shopping, books, amazon, music, store] Similar or closely related annotations are usually given to the same web pages SocialSimRank (SSR) 2.The count of annotations indicates the popularity of web pages from users’ point of view SocialPageRank (SPR) Giulia Mialich - Luca Rossi

[2007] Yin et al. work Innovatively social tag exploration for web object classification. They propose an iterative algorithm wich solves the problem efficiently, significantly outperforming the state-of-the-art methods that don’t use tags as bridges. Giulia Mialich - Luca Rossi

Social Tagging Graph Labeled objects Unlabeled objects Simplifying assumption: 2 types of objects, S and T. S objects are already labeled Giulia Mialich - Luca Rossi

Social Tagging Graph t , { c 1 , c 2 , . . . , c k } is the set of all the categories C = • V l T : G = (V,E) is the social tagging graph, where V is the set of all the objects (plus tags) and E is the set of edges between an object and its tags Giulia Mialich - Luca Rossi

Social Tagging Graph t , { c 1 , c 2 , . . . , c k } is the set of all the categories C = • V u T : G = (V,E) is the social tagging graph, where V is the set of all the objects (plus tags) and E is the set of edges between an object and its tags Giulia Mialich - Luca Rossi

Social Tagging Graph t , { c 1 , c 2 , . . . , c k } is the set of all the categories C = • V tag G = (V,E) is the social tagging graph, where V is the set of all the objects (plus tags) and E is the set of edges between an object and its tags Giulia Mialich - Luca Rossi

Social Tagging Graph t , { c 1 , c 2 , . . . , c k } is the set of all the categories C = • V S : G = (V,E) is the social tagging graph, where V is the set of all the objects (plus tags) and E is the set of edges between an object and its tags Giulia Mialich - Luca Rossi

Intuitions Web users are likely to select similar tags for objects beloning to the same semantic category, independent of the type ⇒ tags can be used as a “bridge” to semantically connect objects Giulia Mialich - Luca Rossi

Intuitions The label assigned by the classifier should be consistent. This consistency can be captured by the following 3 properties: Category assignment of a vertex in Vs or Vlt should not deviate much from its original label as long as we trust the initial labeling Giulia Mialich - Luca Rossi

Intuitions The label assigned by the classifier should be consistent. This consistency can be captured by the following 3 properties: Category assignment of a vertex in Vut should take into account any prior knowledge Giulia Mialich - Luca Rossi

Intuitions The label assigned by the classifier should be consistent. This consistency can be captured by the following 3 properties: Category assignment of any vertex of the graph should be as consistent as possible with its neighbors’ labels Giulia Mialich - Luca Rossi

The Optimization Framework • f u : a k -dimension vector that represents the class distribution of vertex u ∈ V , where k is the number of categories. f u [ i ] represents the possibility that u belongs to category i , s.t. � k i =1 f u [ i ] = 1. We denote { f u } u ∈ V as f . • f ∗ u : the optimal solution of f u • ˆ T , ˆ f u : for u ∈ V S ∪ V l f u is the class distribution estimated from the original category labels of vertex u . For T , ˆ u ∈ V u f u is the class distribution estimated from some prior knowledge of the unlabeled object u (e.g., the label assignments by a domain classifier). • w uv : a weight of the importance of edge ( u, v ). Given an object u and its associated tag v , w uv is the frequency that v is used to tag u . Giulia Mialich - Luca Rossi

The Optimization Framework � f u − ˆ f u � 2 � O ( f ) = α u ∈ V S � f u − ˆ � f u � 2 + β u ∈ V l T f u � 2 + � f u − ˆ � + γ u ∈ V u T w uv � f u − f v � 2 � + ( u,v ) ∈ E Giulia Mialich - Luca Rossi

The Optimization Framework f u � 2 means that the category of a vertex in u ∈ V S � f u − ˆ 1. � V S should not deviate much from its original label(s). f u � 2 means that the category of a vertex T � f u − ˆ 2. � u ∈ V l in V l T should keep close to its initial label(s). f u � 2 means that the category of a vertex T � f u − ˆ 3. � u ∈ V u in V u T should keep close to the prior knowledge if any. ( u,v ) ∈ E w uv � f u − f v � 2 makes sure that the class distri- 4. � bution of the vertices are smooth over the whole graph, i.e., the class distribution of a vertex is consistent with its neighbors. Giulia Mialich - Luca Rossi

The Optimization Framework Our target is to find f ∗ = arg min O ( f ) Based on this class distribution we can state that, given an object o , its class c is c = arg max P ( o | c ) P ( o ) = arg max P ( c | o ) P ( c ) Giulia Mialich - Luca Rossi

The Optimization Framework Our target is to find f ∗ = arg min O ( f ) Based on this class distribution we can state that, given an object o , its class c is f ∗ u [ i ] c = arg max T f ∗ � u � [ i ] 1 ≤ i ≤ k u � ∈ V l T ∪ V u Giulia Mialich - Luca Rossi

Any problem so far? Oh yes... finding a closed-form solution to this problem requires inverting a huge matrix with the size of all the objects and tags Giulia Mialich - Luca Rossi

Any problem so far? Why not using a smart iterative algorithm instead? I’ve got an idea : let’s di fg erentiate O(f) with respect to the 4 types of vertices and update f by setting the di fg erentiated result to zero! Giulia Mialich - Luca Rossi

Exploring Social Tagging Graph for Web Object Classification Zhijun - PowerPoint PPT Presentation

Data e Web Mining AA 2009/2010 Exploring Social Tagging Graph for Web Object Classification Zhijun Yin, Rui Li, Qiaozhu Mei, Jiawei Han Giulia Mialich 825102 Luca Rossi 825038 Stats recently announced 300

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Lecture Overview Web 2.0, Tagging, Multimedia, Introduction to Web 2.0 Overview of

On the Navigability of Social Tagging On the Navigability of Social Tagging Systems Christoph

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier,

Traffic UTM Tagging AdWords WebMaster Tools UTM TAGGING Where does my traffic come from? UTM

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

Multichannel Variable-Size Convolution for Sentence Classification - WenPeng Yin - Hinrich

experiment and sid ide channels arXiv: 1911.00690 (2019) Wei Li , Feihu Xu Kejin Wei, Hao Tan,

Static Analysis of Dynamically Typed Languages made Easy Yin

CS2P: Improving Video Bitrate Selection and Adaptation with Data-Driven Throughput Prediction Y.

Analysis and Computation for Analysis and Computation for Nonlinear Eigenvalue Eigenvalue

Puzzling Neutron: A Window to Dark Matter? A Detective Story in three parts A Detective Story

Anatomy and Interpretability of Neural Networks Leon Yin ~ Data Scientist | Research Engineer

Numerical analysis and random matrix theory Tom Trogdon ttrogdon@math.uci.edu UC Irvine