Twitter Data Processing with MongoDB
By Ama & Sameera
Twitter Data Processing with MongoDB By Ama & Sameera - - PowerPoint PPT Presentation
Twitter Data Processing with MongoDB By Ama & Sameera Introduction Create twitter developer account Get access key Access REST API Execute some POST and GET queries Download a sample of twitter streaming data
By Ama & Sameera
db.finaltwitterdata.aggregate( [ { $match: {$or: [{'text': {$regex:".*Sunday.*"}},{'text': {$regex: ".*sunday.*"}}] }} ,{$group:{_id:null, count:{$sum:1}} }])
1000 2000 3000 4000 5000 6000 7000 8000
db.finaltwitterdata.aggregate( [ { $match: {$or: [{'text': {$regex:".*Paris.*"}},{'text': {$regex: ".*paris.*"}}] }} ,{$group:{_id:"$user.time_zone", count:{$sum:1}} },{$sort: {count:-1}}])
1000 2000 3000 4000 5000 6000
db.finaltwitterdata.aggregate( [ { $match: {$or: [{'text': {$regex:".*Thanksgiving.*"}},{'text': {$regex: ".*thanksgiving.*"}}] }} ,{$group:{_id:"$user.time_zone", count:{$sum:1}} },{$sort: {count:-1}}])
db.finaltwitterdata.aggregate( [ { $match: {$or: [{'text': {$regex: ".*Nicky Minaj.*"}},{'text': {$regex: ".*@NICKYMINAJ.*"}}, {'text': {$regex: ".*nicky minaj.*"}} ] }} ,{$group:{_id:null, count:{$sum:1}} }])
db.finaltwitterdata.aggregate( [ { $match: {$or: [{'text': {$regex:".*5SOS.*"}},{'text': {$regex: ".*5 Seconds Of Summer.*"}},{'text': {$regex: ".*5 Seconds of Summer.*"}},{'text': {$regex: ".*5 seconds of summer.*"}} ] }} ,{$group:{_id:null, count:{$sum:1}} }])
Research Paper
After significant breaking news events, Twitter aims to provide relevant results
within minutes; typically ten minutes.
Related query suggestion is a feature that most searchers are likely familiar with,
e.g. typing “Obama”
Two systems were built to achieve this target but only one was eventually
deployed:
processing engine.
dynamically adapt to the rapidly evolving "global conversation".
addressed the challenges of real-time data processing in the era of "big data“
Hadoop
ZooKeeper, and Vertica.
time and batch processes.
mostly using Pig
estimated in hours.
The search assistance engine consists of:
A lightweight frontend serving requests from an in-memory cache, A backend that consumes the fire hose and query hose to compute related query
suggestions and spelling corrections.
The query path: as a query from a given user is delivered through the query hose, the following actions are taken:
Query statistics are updated in the query statistics store The query is added to the sessions store For each previous query in the session, a query co-occurrence is formed with the new
query.
future system designers can benefit from their story and build the right solution the first time.
handling both “big data” and “fast data”.