Responsive Analytics of Highly-Connected Big Data Dr. Peter - - PowerPoint PPT Presentation

responsive analytics of highly connected big data
SMART_READER_LITE
LIVE PREVIEW

Responsive Analytics of Highly-Connected Big Data Dr. Peter - - PowerPoint PPT Presentation

Responsive Analytics of Highly-Connected Big Data Dr. Peter Janacik, peter.janacik@tu-berlin.de Stream Reasoning Workshop 2016 TU Berlin, December 8, 2016 www.cit.tu-berlin.de Linking concepts Graphs Content streams occurs in concept


slide-1
SLIDE 1

www.cit.tu-berlin.de

  • Dr. Peter Janacik, peter.janacik@tu-berlin.de

Stream Reasoning Workshop 2016 – TU Berlin, December 8, 2016

Responsive Analytics of Highly-Connected Big Data

slide-2
SLIDE 2

www.cit.tu-berlin.de

Linking concepts

Content streams Users Graphs Data from social networks (Twitter/Instagram), Wikipedia, Wiktionary, evocation/
 synonym databases, medical knowledge, etc. posted favored/
 commented

  • ccurs in

concept
 affinity

2

slide-3
SLIDE 3

Extracting sense from tokens

Afuer patiently waiting a black cat ,

Symbol/token layer

Lemma Lemma Lemma Sense Sense Lemma Sense Lemma Sense Sense Lemma Sense

www.cit.tu-berlin.de

  • Approach
  • Symbol/token: notion in

general, oriented gradient cluster, recognized object, word

  • Semantic overlay absorbs flow
  • f meaning
  • Follow-up processing
  • Polarization and gist extraction
  • Wikipedia/wiktionary as

knowledge model for semantic

  • verlay

3

slide-4
SLIDE 4

www.cit.tu-berlin.de

Apache Flink

  • Was initiated at TU Berlin (first under the name

Stratosphere)

  • MapReduce does not provide sufficient means

to implement state-of-the-art analysis methods

  • Flink allows to connect transformations

(vertices) to a graph using data streams (directed edges)

  • Distributed execution and placement within a

cluster: Flink program -> subtasks -> slots

  • Number of slots on one physical node is

configurable but usually it is equal to number

  • f cores
  • Maximum degree of parallelism can be defined

and is used by Flink during execution

4

A B D C D E

Degree of parallelism is adjustable, here 2

slide-5
SLIDE 5

www.cit.tu-berlin.de

Future Work

  • Partitioning of data flow graph
  • ver several data centers based
  • n available resources, data

stream bandwidth, data privacy criteria

  • Optimization criteria
  • Min processing time
  • Min costs
  • Best fit
  • Matching different criteria
  • Dynamic migration in order to

accommodate changing characteristics of physical topology (available bandwidth/ resources (nodes), price, follow the sun, etc.)

5

A B D C F E

E, F, B, Z in data center 1

Z

Cut a Cut b D, C, A in data center 2 Data streams with different bandwidth

slide-6
SLIDE 6

Interdependence of areas

www.cit.tu-berlin.de 6

Concept/ relationship/story detection

Data/results

Human data/ feedback generation Visualization Distribution at web- scale for insightful analysis

Comprehensible presentation Results/ recommendation to trigger Enabled by Alters models/algo behavior Enabled by

slide-7
SLIDE 7

www.cit.tu-berlin.de 7

Semantic graph as result of analysis

slide-8
SLIDE 8

www.cit.tu-berlin.de 8

Semantic graph as result of analysis

slide-9
SLIDE 9

www.cit.tu-berlin.de 9

Instagram interaction heat map

slide-10
SLIDE 10

www.cit.tu-berlin.de 10

Instagram interaction heat map

slide-11
SLIDE 11

www.cit.tu-berlin.de 11

But how to visualize, what these graphs are about, 
 when there are typically millions to trillions of edges?

Semantic graph as result of analysis

slide-12
SLIDE 12

Approaches to visualization

www.cit.tu-berlin.de 12

  • 2-dimensional
  • Works well with most of

the currently available devices, no special hardware needed

  • Supported by broad range
  • f platforms
  • Less complex, easier to

implement

  • Fewer problems with

readability and overlap

slide-13
SLIDE 13

Approaches to visualization

www.cit.tu-berlin.de 13

  • 3-dimensional
  • Chance to make use of

additional dimension to untangle the big graph

  • Different perspectives

may cover different aspects/lead to different conclusions

  • Can exploit the full

potential of touch interfaces

slide-14
SLIDE 14

www.cit.tu-berlin.de

Responsive Analytics of Highly-Connected Big Data

  • Dr. Peter Janacik, peter.janacik@tu-berlin.de

Stream Reasoning Workshop 2016 – TU Berlin, December 8, 2016