Big Data
- verview, issues, challenges and opportunities
- C. Onime
(onime@ictp.it)
1
Big Data overview, issues, challenges and opportunities C. Onime - - PowerPoint PPT Presentation
Big Data overview, issues, challenges and opportunities C. Onime (onime@ictp.it) 1 Outline Interactive session Introduction to Big-Data Issues/challenges Taxonomy classifications Conclusion Opportunities and future 2
1
2
3
4
5
6
Clement Onime - onime@ictp.it 7
Clement Onime- onime@ictp.it 8
Clement Onime- onime@ictp.it 9
Clement Onime- onime@ictp.it 10
Clement Onime- onime@ictp.it 11
Clement Onime- onime@ictp.it 12
– High Performance Computing (LHC, SKA, Genomics)
– Heading towards Nano-circuits, clocking resolutions, etc
– Networks: always connected devices, capacity; Clouds: anytime, anywhere on-demand metered access to resources
– Social networking
Clement Onime - onime@ictp.it 13
– From presentation by Michael Cooper & Peter Mell of NIST
Clement Onime- onime@ictp.it 14
Clement Onime- onime@ictp.it 15
Data Mapping
Compute infrastructure Storage Infrastructure Analytics Visualisation Security & Privacy Clement Onime - onime@ictp.it 16
Clement Onime - onime@ictp.it 17
BATCH NEAR-REAL-TIME REAL-TIME
STRUCTURED SEMI STRUCTURED UNSTRUCTURED LARGE SCALE SCIENCE (HEP, Genomics) VISUAL MEDIA (Video scene detection, image understanding) FINANCIAL (High speed training) RETAIL (Sentiment & behaviour analysis) SOCIAL NETWORKING (Trend analysis, query processing) SENSOR DATA (ID, long term trends , weather) NETWORK SECURITY (ID, malwares/virus attacks)
Compute Infrastructure Batch Map Reduce Hadoop S4 Bulk synchronous parallel Hama Giraph Pregel Streaming Storm Spark
Clement Onime - onime@ictp.it 18
Clement Onime- onime@ictp.it 19
Clement Onime- onime@ictp.it 20
Clement- Onime onime@ictp.it 21
Master node (Nimbus) Worker process Worker process Supervisor Worker node Zookeeper framework Task Task Executor Task Task Executor Worker process Supervisor Worker node
Clement- Onime onime@ictp.it 22
– Network of spouts and bolt – Parallel & cyclic execution
– Shuffle, all, Global, fields
– Twitter analytics: spout, bolts: parse, count, ranks, report
Clement- Onime onime@ictp.it 23
Spout Spout Bolt Bolt Bolt Bolt
Storage Infrastructure Relational (SQL) Examples (Oracle, MySQL, PostgreSQL, etc) NoSQL Document oriented Examples (MongoDB, CouchDB, CouchBase) Key-value stores In memory (Memcached, Redis, Aerospike) Dynamo inspired (Cassandra, Riak, Voldemart) Big-Table Examples (Hbase, Cassandra) Graph oriented Examples (Giraph, Neo4j, OrientDB) NewSQL In memory Examples (Hstore, VoltDB)
Clement Onime - onime@ictp.it 24
Clement Onime - onime@ictp.it 25
BATCH NEAR-REAL-TIME REAL-TIME
STRUCTURED SEMI STRUCTURED UNSTRUCTURED SQL (MySQL, PostgreSQL, SQL-lite) NoSQL (MongoDB, CrouchDB, Cassandra) Neo4j Storm, Kinesis Shark, Spark VoltDB Titan Redis Aerospike
Clement Onime- onime@ictp.it 26
Machine learning algorithm Supervised Regression (Polynomials, MARS) Classification (Decision trees, Naïve Bayes, Support vector machines) Un-supervised Clustering (K-means, Gaussian mixtures) Reduction (Principle component analysis) Semi-supervised Active Co-training Re-enforcement Markov decision process Q-Learning
Clement Onime - onime@ictp.it 27
Statistics Machine learning Model Network, Graphs Data point Examples/instances Response Label Parameters Weights Covariate Feature Fitting/Estimation Learning Test set performance Generalization Regression/Classification Supervised Learning Density estimation, Clustering Unsupervised Learning
Clement Onime- onime@ictp.it 28
Visualisation Spatial layout Charts / plots Line/ bar charts Scatter plots Trees / graphs Tree maps Arc diagrams Abstract or summary Binning Data cubes, histograms Clustering Hierarchical aggregation Interactive or real-time Deep zoom MS Pivot viewer, Tableau Mixed reality AR systems / tools
Clement Onime - onime@ictp.it 29
Clement Onime - onime@ictp.it 30
𝐹𝑁𝑆 = න(𝑆 + 𝑊) where
– Headsets, wearable devices – Custom and typically not cost effective
– Commodity devices: smart- phones and tablets – Cost effective
Clement Onime - onime@ictp.it 31
Clement Onime - onime@ictp.it 32
Clement Onime - onime@ictp.it 33
User 180° horizontal by 3 markers on walls and 90° vertical by marker on floor
Security and privacy Infrastructure Secure computations Best practices Data privacy Privacy preservation Cryptography Access control Data management Secure storage Transaction logs and Audits Provenance Integrity and reactive security End-point security Real-time monitoring
Clement Onime - onime@ictp.it 34
Clement Onime - onime@ictp.it 35
Clement Onime - onime@ictp.it 36
Clement Onime - onime@ictp.it 37
Trusted certificates Server Client Client hello reply + certificate Trusted certificates Key exchange + certificate Client OK Server OK Encrypted messages
Clement Onime - onime@ictp.it 38
Clement Onime- onime@ictp.it 39
– South Africa and Brazil , maybe a HPC school in Mexico
Clement Onime- onime@ictp.it 40
Clement Onime- onime@ictp.it 41
Clement Onime- onime@ictp.it 42