Data Acquisition Axel Ngonga Lead Data Acquisition BIG Data PPF - - PowerPoint PPT Presentation

data acquisition
SMART_READER_LITE
LIVE PREVIEW

Data Acquisition Axel Ngonga Lead Data Acquisition BIG Data PPF - - PowerPoint PPT Presentation

Data Acquisition Axel Ngonga Lead Data Acquisition BIG Data PPF http://big-project.eu Motivation Increasing amout of data 4K new pictures on Instagram 100K tweets 800K new pieces of content on Facebook Motivation


slide-1
SLIDE 1

Data Acquisition

Axel Ngonga Lead Data Acquisition BIG Data PPF http://big-project.eu

slide-2
SLIDE 2

Motivation

  • Increasing amout of data

○ 4K new pictures on Instagram ○ 100K tweets ○ 800K new pieces of content on Facebook ○ …

slide-3
SLIDE 3

Motivation

slide-4
SLIDE 4

Motivation

  • Big data technologies for

○ Improved business intelligence ○ Secure decisions ○ Customized services ○ …

  • Use Cases

○ Mission planning ○ Trade market ○ Customized services ○ Criminality prediction ○ ...

slide-5
SLIDE 5

Definition

  • Data acquisition stands for

○ Selecting of data sources ○ Collection of information from these sources ○ Filtering and cleaning data

slide-6
SLIDE 6

Overview

DS DS DS DS

Processing (cleaning, classification)

Storage

slide-7
SLIDE 7

More than 3 Vs

  • The 9(?) Vs of Big Data Acquisition

○ Volume ○ Velocity ○ Variety ○ Vocabulary ○ Variability (security models, ownership) ○ Veracity (trustworthiness of data) ○ Visibility (integrated view of data) ○ Value (worth of data for data consumer) ○ Visualization

slide-8
SLIDE 8

Requirements

  • Extensibility of protocols
  • High scalability of approaches
  • Low memory consumption
  • Parallelism
  • Elasticity
  • Fast ROI
  • High throughput (real-time)
slide-9
SLIDE 9

Technology Overview

  • Gathering

○ Advanced Message Queuing Protocol ■ Wire-level protocol ■ OASIS Standard since Oct. 2012 ■ Large number of implementations incl. RabbitMQ, SwiftMQ, Apache ActiveMQ, Windows Azure Service Bus ○ JMS 2.0 ○ Kestrel (Memcached) ○ Apache Kafka ○ Apache Flume (log data) ○ FB Scribe (log data)

slide-10
SLIDE 10

Technology Overview

  • Processing

○ Facebook Scribe (Aggregation) ○ Twitter Storm (Stream Data Processing, Analysis) ○ MOA (Massive Online Analysis, esp. classification) ○ Hadoop (Distributed Processing) ○ InfoSphere Streams (Analysis)

slide-11
SLIDE 11

Technology Overview

  • Storage

○ MongoDB (BSON) ○ Apache CouchDB (JSON) ○ Neo4J (Graph DB) ○ Oracle NoSQL ○ IBM DB2 NoSQL

  • Holistic Frameworks

○ Oracle's Big Data Suite ○ IBM's Big Data Suite ○ Karmasphere

slide-12
SLIDE 12

Tool Matrix

slide-13
SLIDE 13

Simple Recipe

  • 1. Which of the 9Vs are important for me?
  • 2. What are my sources?

○ Protocols ○ Velocity ○ Type of data (logs, XML, …) ○ ...

  • 3. What’s my current storage

architecture?

○ NoSQL? ○ Distributed?

slide-14
SLIDE 14

Thank You! Questions?

Axel Ngonga University of Leipzig AKSW Research Group ngonga@informatik.uni-leipzig.de http://aksw.org/AxelNgonga http://big-project.eu

slide-15
SLIDE 15
slide-16
SLIDE 16

Questionnaire