a s a stat tate of of the the art art revi view w on on
play

A S A Stat tate-of of-the the-art art Revi view w on on Big - PowerPoint PPT Presentation

A S A Stat tate-of of-the the-art art Revi view w on on Big Da Bi g Data ta Tec echnol hnologi ogies es Seman mantic ic technol ologie ogies for r Big D g Data: a: Volume, ume, Velocity ocity, , Vari riety ety and nd


  1. A S A Stat tate-of of-the the-art art Revi view w on on Big Da Bi g Data ta Tec echnol hnologi ogies es Seman mantic ic technol ologie ogies for r Big D g Data: a: Volume, ume, Velocity ocity, , Vari riety ety and nd Veracit city y @ ICIST 2019 o Jeli ć , BSc EE & CS Mark rko Junior researcher, The Mihajlo Pupin Institute Dea Puji ć, BSc EE & CS Junior researcher, The Mihajlo Pupin Institute Hajira Jabeen , PhD Senior researcher, Computer Science Institute, University of Bonn

  2. Acknowledgment/Context LAMBD MBDA 1 (Learning, Applying, Multiplying, Big Data Analytics) is a twinning 2 H2020 project The main goal of the project is to provid vide different knowl wledge dge trans ansfer fer instr trum uments ents (mentorships, brainstorming sessions, school type activities) and different types of twin inni ning ng rela lati tionshi onships ps (institution to institution, institution to network) The specifi fic focus us of the knowledge transfer process is placed on the Big g data a domain in and corresponding technol nologie ogies s and service ices 1 https://project-lambda.org/ 2 https://ec.europa.eu/neighbourhood-enlargement/tenders/twinning_en 2

  3. What is Big data? Big g data a is used more as a buz uzzwor zword then a precisel ecisely y defi fined ed scientific obje bject ct or phe henomena omena Generally used when referring to data a loads ds that the moder ern-day day IT inf nfras astru tructur cture cannot cope with at all or in n an n eff ffici icient ent manner nner More precisely, Big data is usually used when referring to data a sets ts that are sized in the order der of magn gnitud tude of exab xabytes tes ( 10 18 B ) or greater The introduction of US social security in 1937 1937 is considered by some as the start art of the he Big d g data a era but this term has gained most t of its popular arity y jus ust rec ecentl ently following the development of data heavy applications * Illustrations by https://www.freepik.com/macrovector 3

  4. Nature of Big data Big data is often characterized trough so so- called V’s of Big data that capture its complex nature Volume – amoun unt of data that has to be captured, stored, processed and displayed Velocity – the rate at which the data is being generated, or analyzed Variety – differ ferences ences in data struc uctur ture (format) or diffe fferences ences in data sour urces es themselves 3V’s Veracity – truthfulness (uncertain tainty) of data Validity – suita itabi bili lity ty of the selected dataset for a given application 5V’s Volatility – tempo poral al validity lidity and fluency of the data Value – (useful) info formatio rmation extracted from the data 7V’s Visualization – properly displayin playing and showcasing information Vulnerability – security urity and priva vacy concerns associated Variability – the changin ging meanin ning of data 10V’s 4

  5. Big data challenges The core e technol nolog ogica ical l challe lenges nges working with Big data that stem from om its comple lex natur ure are: Heterogeneity – differences in structure Uncertainty – data reliability Scalability – sizing the workflow and infrastructure Timeliness – real-time requirements Storing Processing Analytics Visualization Fault tolerance – sensitivity to errors Heterogeneity + + Data security – privacy issues, data leaks Uncertainty of data + + Visualization – displaying of information Scalability + + + Timeliness + + + Fault tolerance + + Data security + + Visualization + 5

  6. Big data Storage No-SQL (not only SQL) databases Knowledge graphs Key-value stores Document oriented Hazelcast MongoDB Redis Apache CouchDB Membrane/Cocuhbase Terrastore Riak RavenDB Voldemort Graph oriented Infinispan Neo4J Wide-column Infinite-Graph Apache Hbase InfoGrid Hypertable HypergraphDB Apache Cassandra AllegroGrap BigData * Illustrations by https://aws.amazon.com/neptune/ and https://lod-cloud.net/versions/2014-08-30/lod-cloud.svg 6

  7. Big data analytics Process essin ing the data and applying infer ferenc ence (i.e. trough machine ne learni rning ng) on Big data is key for proper knowledg wledge (value) extract action ion generalized linear model gradient boosting tree discriminant analysis survival regression isotonic regression logistic regression linear regression isolation forest random forest decision trees bagging CART drift classifier model-fitting naive Bayes ensembles XGboost SVM C4.5 kNN NN + + + + + + + + + + Apache Spark + + + + + + + + H2O + + + + + + + + + + + R + + + + MOA + + + + + + + + + + + + + + + Scikit - Learn + + + + + + + Bigml + + + + + + Weka Systematization of regression and classification learning algorithms in Big data tools 7

  8. Big data analytics If the data is not already eady labele eled i.e. separated into appropriate classes, clust stering ering algor gorithms ithms need to be applied first in order to determine adequate class limits Afinity propagation Gaussian mixture Fuzzy clustering Dencity based Model-based aggregator Hierarhical G-means K-means CLARA PAM LDA PIC Apache Spark + + + + + H2O + + R + + + + + + + Giraph + + BigML + + + Systematization of clustering learning algorithms in Big data tools 8

  9. Big data visualization Graphs and Cross-platform JavaScript libraries Chart tools networks (open source) NodeXL Fusion Charts Sigma JS Pajek Chart.js Chart.js Multi-purpose SocNetV Chartist.js Leaflet Sentinel Visualizer D3.js .js n3-charts Chartist.js Ember-charts Statnet Canvas n3-charts Googl gle charts Tulip Sigma JS Map tools Non-web Visone Polymaps Leaflet Cuttlefish Commertial Processing.js Polymaps Cytoscape Dyagraph (desktop) Images Gephi Timelines Tableau Graphwiz Processing.js Infogram Timeline JS Graph-tool 9

  10. Questions? Thank you for your attention! Look for the full paper “ A State te-of of-the the-art art Revie iew on Big Data a Technol nologie gies ” in the ICIST 2019 proceedings after April 15 th ! 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend