#IoT #BigData Seema Jethani @seemaj @basho 10/31/14 1 Why should - PowerPoint PPT Presentation

#IoT #BigData Seema Jethani @seemaj @basho 10/31/14 1

Why should we care? 11/2/14 2 Source: http://en.wikipedia.org/wiki/Internet_of_Things

Motivation for Specialized Big Data Systems Rate of data capture started soaring Traditional data warehouses and RDBMS systems could not keep up Specialized Big Data systems were introduced - Distributed Cluster-based Commodity priced Linearly scalable Process parallelly Node redundant

Where is the value? Analytics Big Data IoT 10/31/14 4

The Big Data Value Chain Source: EY 10/31/14

A tale of meters A leading provider of cloud-based meter data management for water, gas and heat meters. Data is revenue critical Data loss is non acceptable Reliability and availability trumps all 10/31/14 6

Temetra growth 10/31/14 7

Shape of Temetra data Big data: Millions of meters generating billions of data points Meters in 2000: four data points a year Meters in 2013: up to 35,000 data points a year Enormously high data ingress with relatively few reads Small number of users (1000s, 100s logged in) Slow moving, ever increasing data – Audit trails , photos Traditional databases were no longer suitable – Selected Riak to help manage data growth 11/2/14 8

Data Types in Riak Developer friendly distributed data types to help track updates in an eventually consistent environment Pre-built data types - no complex, client-side resolution logic is required when using server-side data types First introduced in 1.4 as counters, 2.0 adds: SETS FLAGS REGISTERS MAPS 10/31/14 9

CRDTs CRDTs expose simple, well-known data types but with an internal structure that makes it safe to update them without any coordination between writers and without any loss of information in the face of concurrency. The “C” in CRDT can stand for three different things – “convergent” datatypes ensure that disparate states converge to a single value – “commutative” datatypes are updated with commutative operations – “conflict-free” datatypes if you wish to describe both/either at once without referring to the specifics of your internal choices. 10/31/14 10

Benefits of CRDT § We don’t need to send duplicate data. § For CmRDTs It doesn’t matter what order the two requests happen in, the outcome will be the same. § There is no possibility of a datatype returning siblings, making client code that much easier. – Conflicting values are known as siblings – Siblings arise in a couple cases. • 1. A client writes a value using a stale (or missing) vector clock. • 2. Two clients write at the same time with the same vector clock value. 11/3/14 11

Data Types in Riak DATA TYPE USE CASE Counters (v1.4) – keep track of ‒ Track number of page "likes" or number of followers increments/decrements ‒ Has a tweet been re-tweeted Flags – enabled/disabled ‒ Is a user is eligible for preferred pricing ‒ List items in an online shopping cart Sets – collection of binary ‒ UUIDs of a user's friends in a social networking app values ‒ Store user profile names Registers – named binary with ‒ Store primary search location for a search engine user values also binary ‒ Store user profile data composed Maps – supports nesting of register user_name multiple data types flag email_notifications counter site_visits 10/31/14 12

Conflict Resolution of data types in Riak DATA TYPE USE CASE ‒ Each actor keeps an independent count for increments and decrements Counters (v1.4) – keep track of ‒ Upon merge, the pairwise maximum of any two actors will increments/decrements win (e.g. if one actor holds 172 and the other holds 173, 173 will win upon merge) ‒ enable wins over disable Flags – enabled/disabled ‒ If an element is concurrently added and removed, the add Sets – collection of binary values will win ‒ The most chronologically recent value wins, based on Registers – named binary with values also binary timestamps ‒ If a field is concurrently added or updated and removed, the Maps – supports nesting of multiple data types add/update will win 10/31/14 13

A CRDT example § Assume that the bucket type map is of a map datatype This command will insert a map object with two fields (name_register and pets_set ). § curl -XPOST "$RIAK/types/map/buckets/people/keys/joe” -H "Content-Type:application/json” -d ’{ "update": { "name_register": "Joe » "pets_set": { "add_all": "cat » } } }’ § Next, we want to update the pets_set contained within joe ’s map. Rather than set Joe’s name and his pet cat, we only need to inform the object of the change. Namely, that we want to add a fish to his pets_set . § curl -XPOST "$RIAK/types/map/buckets/people/keys/joe” -H "Content-Type:application/json” -d ’{ "update": { "pets_set": { "add": "fish” } } }' 10/31/14 14

Querying and analyzing the data e.g. Find the closest post code to a particular post code Riak Search combines the operational simplicity and fault tolerance of Riak with the powerful search functionality of Apache Solr Allows for distributed, scalable, transparent indexing and querying of Riak data values Combine CRDTs, Search with pre and post processing of data to analyze data in real time 11/1/14 15

Riak search queries Query Parameters Features: • Scoring and ranking for most relevant • Exact match results • Globs • Search queries as input for • Inclusive/exclusive range queries MapReduce jobs • AND/OR/NOT • Active Anti-Entropy for automatic index • Prefix matching repair • Proximity searches • Multiple languages, geo-spatial • Term boosting search, tokenizers and filters • Supports various MIME types (JSON, • Sorting XML, plain text, data types) for • Pagination automatic data extraction 11/3/14 16

Write it like Riak Query it like SOLR Every node in a Riak cluster has a corresponding operating system (OS) process running a JVM which hosts Solr on the Jetty application server. Riak Search listens for changes in key/value (KV) data and makes the appropriate changes to Solr indexes Riak Search takes a user query on any node and converts it to a Solr distributed search Riak Search takes index creation commands and disseminates that information across the cluster Riak Search communicates and monitors the Solr OS process 11/3/14 17

Example Search using SOLR Indexes may be associated with zero or more buckets. At creation time, however, each index has no associated buckets To associate a bucket with an index, the bucket property yz_index must be set to the name of the index you wish to associate. Conversely, in order to disassociate a bucket you use the sentinel value _dont_index_. Many buckets can be associated with the same index. A bucket cannot be associated with many indexes—the yz_index property must be a single name, not a list 11/1/14 18

SOLR example Schemas explain to Solr how to index fields Indexes are named Solr indexes against which you will query Bucket-index association signals to Riak when to index values Search Index with default schema: curl -XPUT $RIAK_HOST/search/index/famous \ -H 'Content-Type: application/json' \ -d '{"schema":"_yz_default"}’ Bucket Index association: riak-admin bucket-type create animals '{"props":{"search_index":"famous"}}' riak-admin bucket-type activate animals Write data: curl -XPUT "$RIAK_HOST/types/animals/buckets/cats/keys/liono" \ -H'content-type:application/json' \ - d'{"name_s":"Lion-o", "age_i":30, "leader_b":true}' curl -XPUT "$RIAK_HOST/types/animals/buckets/cats/ keys/cheetara" \ -H'content-type:application/json' \ -d'{"name_s":"Cheetara", "age_i":28, "leader_b":false}’ Query: curl $RIAK_HOST/search/query/famous?wt=json&q=name_s:Lion* | jsonpp { "numFound": 1, "start": 0, "maxScore": 1.0, "docs": [ { "leader_b": true, "age_i": 30, "name_s": "Lion-o", "_yz_id": "default_cats_liono_37", "_yz_rk": "liono", "_yz_rt": "default", "_yz_rb": "cats" } ] } 11/1/14 19

Summary § IoT deployments will generate large quantities of data that need to be processed and analyzed “IoT will mean really, really Big Data” (InfoWorld) § We need to design for analytics – “creating a strategy that sees data more as a supply chain than a warehouse” Mike Redding, Accenture § Not all data is made equal – we need to find the important and act on it § Data driven decision making will be key in achieving business success 11/1/14 20

Questions Interested in Tech Talk? smoder@basho.com 10/31/14 21

#IoT #BigData Seema Jethani @seemaj @basho 10/31/14 1 Why should - PowerPoint PPT Presentation

#IoT #BigData Seema Jethani @seemaj @basho 10/31/14 1 Why should we care? 11/2/14 2 Source: http://en.wikipedia.org/wiki/Internet_of_Things Motivation for Specialized Big Data Systems Rate of data capture started soaring Traditional data

The Internet of Things: (almost) every thing connected to Internet By Vctor M. Rivas Santos

IoT - Big Data & Security MWC Smart Cities Seminar Telefnica Global IoT Group Feb 2017

An Introduction to IoT Penetration Testing @libertyunix www.kmco.com The Agenda n IoT Attack

Internet of Things (IoT) Raspberry Pi Summer Camp Tech Talk Raspberry Pi Camp IoT 1

IoT Trade Mission to Malaysia 23 rd 26 th April 2018 IOT IN ASIA AND MALAYSIA Global IoT

Why IoT IoT Domain IoT Data Characteristics Massive data: 20.4 Billion connected Growing

Akintayo Akinyoade 12/01/2017 Survey Roadmap Internet of Things (IoT)? Tech. Enablers for IoT

Considerations for Enterprise Grade IoT Ishu Verma Red Hat AGENDA l 50 Shades of IoT l Functions,

WP2 - Scalability and distributed Bigdata Marc X. Makkes Email: m.x.makkes@vu.nl WP2 Status

Data Privacy and Security in the Age of IoT(Internet of Things) What is IoT? (The Internet of

IoT-Flows: Lightweight Policy Enforcement of Information Flows in IoT Infrastructures Jos

NB-IOT Antti Ratilainen LPWAN@IETF96 1 NB-IoT targeted use cases NB-IoT Low cost Ultra

Consumer IoT security What is consumer IoT? We have defined consumer IoT as products that are

Telkomsel Presenta.on IoT for Making Indonesia 4.0 Jakarta Conven,on Center, 28 November

(IoT) and the Future of Property Management By Nardo Snyman What is IOT? IoT is short for

IOT & Fixed 5G Next Generation of IOT David Sumi, VP of Marketing at Siklu Special Guest:

Practical Programming on Android Introduction Koert Zeilstra is freelance software developer,

Introduction to Haiku An Open Source Desktop OS Niels Sascha Reedijk - 14-10-2007 Table of

Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of

ThinKuvate Presentation Strategy to Approach VCs February 2020 Strictly Private &

CDA Technology and Design Overview ubomr Hribk www.tempest.technology CDA DESIGN

INTEROPen FHIR Curation Work Dr. Munish Jokhani FHIR Curation Clinical Engagement Lead, NHS

Senior Project Presentation Visuals can be faster than words. Presenter shows lyrics

ESSENTIAL PRESENTATION SKILLS Eileen Browne - Training Consultant If you can unscramble these,

#IoT #BigData Seema Jethani @seemaj @basho 10/31/14 1 Why should - PowerPoint PPT Presentation

#IoT #BigData Seema Jethani @seemaj @basho 10/31/14 1 Why should we care? 11/2/14 2 Source: http://en.wikipedia.org/wiki/Internet_of_Things Motivation for Specialized Big Data Systems Rate of data capture started soaring Traditional data

The Internet of Things: (almost) every thing connected to Internet By Vctor M. Rivas Santos

IoT - Big Data &amp; Security MWC Smart Cities Seminar Telefnica Global IoT Group Feb 2017

An Introduction to IoT Penetration Testing @libertyunix www.kmco.com The Agenda n IoT Attack

Internet of Things (IoT) Raspberry Pi Summer Camp Tech Talk Raspberry Pi Camp IoT 1

IoT Trade Mission to Malaysia 23 rd 26 th April 2018 IOT IN ASIA AND MALAYSIA Global IoT

Why IoT IoT Domain IoT Data Characteristics Massive data: 20.4 Billion connected Growing

Akintayo Akinyoade 12/01/2017 Survey Roadmap Internet of Things (IoT)? Tech. Enablers for IoT

Considerations for Enterprise Grade IoT Ishu Verma Red Hat AGENDA l 50 Shades of IoT l Functions,

WP2 - Scalability and distributed Bigdata Marc X. Makkes Email: m.x.makkes@vu.nl WP2 Status

Data Privacy and Security in the Age of IoT(Internet of Things) What is IoT? (The Internet of

IoT-Flows: Lightweight Policy Enforcement of Information Flows in IoT Infrastructures Jos

NB-IOT Antti Ratilainen LPWAN@IETF96 1 NB-IoT targeted use cases NB-IoT Low cost Ultra

Consumer IoT security What is consumer IoT? We have defined consumer IoT as products that are

Telkomsel Presenta.on IoT for Making Indonesia 4.0 Jakarta Conven,on Center, 28 November

(IoT) and the Future of Property Management By Nardo Snyman What is IOT? IoT is short for

IOT &amp; Fixed 5G Next Generation of IOT David Sumi, VP of Marketing at Siklu Special Guest:

Practical Programming on Android Introduction Koert Zeilstra is freelance software developer,

Introduction to Haiku An Open Source Desktop OS Niels Sascha Reedijk - 14-10-2007 Table of

Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of

ThinKuvate Presentation Strategy to Approach VCs February 2020 Strictly Private &amp;

CDA Technology and Design Overview ubomr Hribk www.tempest.technology CDA DESIGN

INTEROPen FHIR Curation Work Dr. Munish Jokhani FHIR Curation Clinical Engagement Lead, NHS

Senior Project Presentation Visuals can be faster than words. Presenter shows lyrics

ESSENTIAL PRESENTATION SKILLS Eileen Browne - Training Consultant If you can unscramble these,

IoT - Big Data & Security MWC Smart Cities Seminar Telefnica Global IoT Group Feb 2017

IOT & Fixed 5G Next Generation of IOT David Sumi, VP of Marketing at Siklu Special Guest:

ThinKuvate Presentation Strategy to Approach VCs February 2020 Strictly Private &