Twitter Data Processing with MongoDB By Ama & Sameera

Introduction � Create twitter developer account � Get access key � Access REST API � Execute some POST and GET queries � Download a sample of twitter streaming data � Analyze a single object a tweet (json format)

Running Hadoop

Twitter Application

Flume configuration

Flume- data streaming

Hadoop File System

Running MongoDB services

Twitter data import

Data Structure � http://www.jsoneditoronline.org/

Data Mining

Tweets Per Topic db.finaltwitterdata.aggregate( [ { $match: {$or: [{'text': {$regex:".*Sunday.*"}},{'text': {$regex: ".*sunday.*"}}] }} ,{$group:{_id:null, count:{$sum:1}} }])

Tweets vs. Time-Zone: Paris db.finaltwitterdata.aggregate( [ { $match: {$or: [{'text': {$regex:".*Paris.*"}},{'text': {$regex: ".*paris.*"}}] }} ,{$group:{_id:"$user.time_zone", count:{$sum:1}} },{$sort: {count:-1}}]) 8000 7000 6000 5000 4000 3000 2000 1000 0

Tweets vs. Time-Zone: Thanksgiving db.finaltwitterdata.aggregate( [ { $match: {$or: [{'text': {$regex:".*Thanksgiving.*"}},{'text': {$regex: ".*thanksgiving.*"}}] }} ,{$group:{_id:"$user.time_zone", count:{$sum:1}} },{$sort: {count:-1}}]) 6000 5000 4000 3000 2000 1000 0

American Music Awards(AMA) 2015

AMA : Artist of the year db.finaltwitterdata.aggregate( [ { $match: {$or: [{'text': {$regex: ".*Nicky Minaj.*"}},{'text': {$regex: ".*@NICKYMINAJ.*"}}, {'text': {$regex: ".*nicky minaj.*"}} ] }} ,{$group:{_id:null, count:{$sum:1}} }])

AMA : Performances db.finaltwitterdata.aggregate( [ { $match: {$or: [{'text': {$regex:".*5SOS.*"}},{'text': {$regex: ".*5 Seconds Of Summer.*"}},{'text': {$regex: ".*5 Seconds of Summer.*"}},{'text': {$regex: ".*5 seconds of summer.*"}} ] }} ,{$group:{_id:null, count:{$sum:1}} }])

AMA : Favorite Electronic Dance Music Artist

Research Paper Fast Data in the Era of Big Data: Twitter’s Real-Time Related Query Suggestion Architecture

Introduction � After significant breaking news events, Twitter aims to provide relevant results within minutes; typically ten minutes. � Related query suggestion is a feature that most searchers are likely familiar with, e.g. typing “Obama” � Two systems were built to achieve this target but only one was eventually deployed: � First implementation was based on a typical Hadoop-based analytics stack. � Second implementation, which was eventually deployed, is a custom in-memory processing engine.

Problem definition � "search assistance" @ Twitter � Twitter context introduces a real-time "twist � At twitter, search assistance needs to be provided in real time and must dynamically adapt to the rapidly evolving "global conversation". � The architecture considers 3 aspects of data – volume, velocity, & variety, and it addressed the challenges of real-time data processing in the era of "big data“

First approach: Hadoop � The first solution sought to take advantage of Twitter's existing analytics platform : Hadoop � Incorporated into its' Hadoop platform are components such as Pig, Hbase, ZooKeeper, and Vertica. � Data is written to the Hadoop Distributed File System (HDFS) via a number of real- time and batch processes. � Intead of directly writing Hadoop code in Java, analystics at Twitter is performed mostly using Pig

Hadoop Platform

Disadvantages � Although the system worked reasonably in terms of output, however, latency was estimated in hours. � This is a far away from the targeted 10 minutes. � The latency is primarily attributed to: � Data import pipeline moving data from tens of thousands of production hosts onto HDFS � MapReduce jobs

New approach: In-memory processing engine

New approach: Search Assistance Engine The search assistance engine consists of: � A lightweight frontend serving requests from an in-memory cache, � A backend that consumes the fire hose and query hose to compute related query suggestions and spelling corrections.

Dataflow The query path: as a query from a given user is delivered through the query hose, the following actions are taken: � Query statistics are updated in the query statistics store � The query is added to the sessions store � For each previous query in the session, a query co-occurrence is formed with the new query.

Conclusion � The authors believe that although the experience was instructive, they hope that future system designers can benefit from their story and build the right solution the first time. � It would be desirable to build a generic data processing platform capable of handling both “big data” and “fast data”.

Thank you ☺

Questions?

Twitter Data Processing with MongoDB By Ama & Sameera - PowerPoint PPT Presentation

Twitter Data Processing with MongoDB By Ama & Sameera Introduction Create twitter developer account Get access key Access REST API Execute some POST and GET queries Download a sample of twitter streaming data

Percona Backup for MongoDB Akira Kurogane Percona 3 - 2 - 1 MongoDB Percona Server for

MongoDB Building data model with MongoDB and Mongoose MVC Pattern Connect Express app to

MongoDB Thomas Schwarz, SJ MongoDB History 2007 Developed by 10gen as a Platform as a Service

MongoDB Sharding 101 Agenda What is MongoDB? Single Instances Replica-set

Everything You Know About MongoDB is Wrong (Probably) Mark Smith | MongoDB | @Judy2K Myth 0

External Authentication with Percona Server for MongoDB and MongoDB Enterprise Jason Terpko DBA

1. Instillations o https://www.mongodb.com/download-center/community 2. Download and Install

Your First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database Me

Geospatial and MongoDB MongoDB Geospatial Features Agenda Query Examples Optimizations 2

Information Retrieval in MongoDB Data storage, Indexing and Querying Kaustubh Dhokte (NB97699)

MongoDB Backups, All Grown up! David Murphy David Murphy MongoDB Practice Manager for Percona

What's New in Percona Server for MongoDB? 2019 Q3: Enterprise Enhancements and v4.2 4:00 PM -

MongoDB and Java 8 Agenda Java8 Main Features MongoDB + Java8 Few Examples RX Driver 3 Java

CS 61: Database Systems MongoDB Schema Design Adapted mongodb.com unless otherwise noted Agenda

Introduction to MongoDB Kristina Chodorow kristina@mongodb.org Application PHP Apache

Dos and Donts of a Hybrid Environment MySQL and MongoDB Introduction Im Rick Vasquez a

HBase on top of HDFS Seminar Software Systems Engineering "Mobile, Security, Cloud

Towards General-Purpose Resource Management in Shared Cloud Services Jonathan Mace , Brown

Where to store all the IoT Data? Piotr Robert Konopelko Business & Technical Support

Sefos A self-aware factored operating system A Traditional OS App 1 App 2 App 3 System call

HCI & Storage 1 2 Isilon The Recognized Leader Reflects on both product

Data Lake to AI on GPUs CPUs can no longer handle the growing data demands of data science

Sentiment Analysis using Hadoop Sponsored By Atlink Communications Inc Instructor : Dr.Sadegh

Introduction to OpenStack Nabil Abdennadher, HES-SO What is OpenStack ? Free and