Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
1
CSE 6240 Web Search and Text Mining Spring 2020 Instructor: Prof. - - PowerPoint PPT Presentation
CSE 6240 Web Search and Text Mining Spring 2020 Instructor: Prof. Srijan Kumar Teaching Assistants: Roshan Pati, Arindum Roy 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Web is a platform for everyone 2
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
1
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
2
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
3
Wikis, Podcasts, Slide sharing, Bookmark sharing, Product reviews, Comments
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
4
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
5
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
6
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
7
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
8
65M msgs/day
People Events Products Services, …
Blogs Microblogs Forums Reviews ,…
53M blogs 1307M posts 115M users 10M groups 45M reviews
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
9
– Many redundant posts, users have to wade through hundreds of posts to locate useful information
– Mine this data in real-time and produce well organized summaries
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
10
– What are people saying about our brand?
– Significant spending on marketing, advertising: Companies trying to position their products – Brand analytics helps to determine whether such campaigns are effective
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
11
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
12
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
13
– Networks from science, nature, and technology are more similar than one would expect
– Computer Science, Social Science, Physics, Economics, Statistics, Biology
– Web/mobile, bio, health, and medical
– Social networking, Drug design, AI reasoning
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
14
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
15
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
16
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
17
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
18
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
19
– fraud, trolls/bots/spammers, fake news
– news/literature/movie recommender
– news categorization, help desk email routing, sentiment tagging
– discovery of topical trends in scientific research – discovery of major complaints from customers
– stock prices from social media posts, voting results
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
20
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
21
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
22
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
23
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
24
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
25
– Research Track: In-depth study of a topic è publication/submission – Development Track: Implementation of a novel application è useful application
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
26
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
27
Assignments
First Day of Instruction
Project Jan Feb Apr Mar Lectures
Last Day
Spring break
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
28
Relevant Data
Small Relevant Data
Knowledge Many Applications
Search engines Filtering Recommender Summarization Clustering Categorization Topic mining Sentiment … … Prediction … … Medical/Health Education Security Business Social Media … …
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
29
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
30
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
31
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
32
ØSummary ØCritique/Shortcomings
ØHow are you improving the existing work?
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
33