apache storm hands on session
play

Apache Storm: Hands-on Session A.A. 2019/20 Fabiana Rossi Laurea - PowerPoint PPT Presentation

Macroareadi Ingegneria Dipartimento di Ingegneria Civile e Ingegneria Informatica Apache Storm: Hands-on Session A.A. 2019/20 Fabiana Rossi Laurea Magistrale in Ingegneria Informatica - II anno The reference Big Data stack High-level


  1. Macroareadi Ingegneria Dipartimento di Ingegneria Civile e Ingegneria Informatica Apache Storm: Hands-on Session A.A. 2019/20 Fabiana Rossi Laurea Magistrale in Ingegneria Informatica - II anno

  2. The reference Big Data stack High-level Interfaces Support / Integration Data Processing Data Storage Resource Management Fabiana Rossi - SABD 2019/20 2

  3. Apache Storm • Apache Storm • Open-source, real-time, scalable streaming system • Provides an abstraction layer to execute DSP applications • Initially developed by Twitter • Topology • DAG of spouts (sources of streams) and bolts (operators and data sinks • stream: sequence of key-value pairs spout bolt Fabiana Rossi - SABD 2019/20 3

  4. Stream grouping in Storm • Data parallelism in Storm: how are streams partitioned among multiple tasks (threads of execution)? • Shuffle grouping • Randomly partitions the tuples • Field grouping • Hashes on a subset of the tuple attributes Fabiana Rossi - SABD 2019/20 4

  5. Stream grouping in Storm • All grouping (i.e., broadcast) • Replicates the entire stream to all the consumer tasks • Global grouping • Sends the entire stream to a single bolt • Direct grouping • Sends tuples to the consumer bolts in the same executor Fabiana Rossi - SABD 2019/20 5

  6. Storm architecture • Master-worker architecture Fabiana Rossi - SABD 2019/20 6

  7. Storm components: Nimbus and Zookeeper • Nimbus – The master node – Clients submit topologies to it – Responsible for distributing and coordinating the topology execution • Zookeeper – Nimbus uses a combination of the local disk(s) and Zookeeper to store state about the topology Fabiana Rossi - SABD 2019/20 7

  8. Storm components: worker • Task: operator instance – The actual work for a bolt or a spout is done in the task • Executor: smallest schedulable entity – Execute one or more tasks related to same operator • Worker process: Java process running one or more executors • Worker node: computing resource, a container for one or more worker processes Fabiana Rossi - SABD 2019/20 8

  9. Storm components: supervisor • Each worker node runs a supervisor The supervisor: • receives assignments from Nimbus (through ZooKeeper) and spawns workers based on the assignment • sends to Nimbus (through ZooKeeper) a periodic heartbeat; • advertises the topologies that they are currently running, and any vacancies that are available to run more topologies Fabiana Rossi - SABD 2019/20 9

  10. Running a Topology in Storm Storm allows two running mode: local, cluster • Local mode: the topology is execute on a single node • the local mode is usually used for testing purpose • we can check whether our application runs as expected • Cluster mode: the topology is distributed by Storm on multiple workers • The cluster mode should be used to run our application on the real dataset • Better exploits parallelism • The application code is transparently distributed • The topology is managed and monitored at run-time Fabiana Rossi - SABD 2019/20 10

  11. Running a Topology in Storm To run a topology in local mode, we just need to create an in-process cluster • it is a simplification of a cluster • lightweight Storm functions wrap our code • It can be instantiatedusing the LocalCluster class. For example: ... LocalCluster cluster = new LocalCluster(); cluster.submitTopology("myTopology", conf, topology); Utils.sleep(10000); // wait [param] ms cluster.killTopology("myTopology"); cluster.shutdown(); ... Fabiana Rossi - SABD 2019/20 11

  12. Running a Topology in Storm To run a topology in cluster mode, we need to perform the following steps: 1. Configure the application for the submission, using the StormSubmitter class. For example: ... Config conf = new Config(); conf.setNumWorkers(NUM_WORKERS); StormSubmitter.submitTopology("mytopology", conf, topology); ... NUM_WORKERS • number of worker processes to be used for running the topology Fabiana Rossi - SABD 2019/20 12

  13. Running a Topology in Storm 2. Create a jar containing your code and all the dependencies of your code • do not include the Storm library • this can be easily done using Maven: use the Maven Assembly Plugin and configure your pom.xml : <plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <descriptorRefs> <descriptorRef>jar-with- dependencies</descriptorRef> </descriptorRefs> <archive> <manifest> <mainClass>com.path.to.main.Class</mainClass> </manifest> </archive> </configuration> </plugin> 13

  14. Running a Topology in Storm 3. Submit the topology to the cluster using the storm client, as follows $ $STORM_HOME/bin/storm jar path/to/allmycode.jar full.classname.Topology arg1 arg2 arg3 Fabiana Rossi - SABD 2019/20 14

  15. Running a Topology in Storm application code control messages Fabiana Rossi - SABD 2019/20 15

  16. A container-based Storm cluster Fabiana Rossi - SABD 2019/20

  17. Running a Topology in Storm Weare going to create a (local) Storm cluster using Docker We need to run several containers, each of which will manage a service of our system: • Zookeeper • Nimbus • Worker1, Worker2, Worker3 • Storm Client (storm-cli): we use storm-cli to run topologies or scripts that feed our DSP application Auxiliary services: they that will be useful to interact with our Storm topologies • Redis • RabbitMQ: a message queue service Fabiana Rossi - SABD 2019/20 17

  18. Docker Compose To easily coordinate the execution of these multiple services, we use Docker Compose • Read more at https://docs.docker.com/compose/ Docker Compose: • is not bundled within the installation of Docker • it can be installed following the official Docker documentation • https://docs.docker.com/compose/install/ • Allows to easily express the container to be instantiated at once, and the relations among them • By itself, docker compose runs the composition on a single machine; however, in combination with Docker Swarm, containers can be deployed on multiple nodes Fabiana Rossi - SABD 2019/20 18

  19. Docker Compose • Wespecify how to compose containers in a easy-to-read file, by default named docker-compose.yml • To start the docker composition (in background with -d): $ docker-compose up -d • To stop the docker composition: $ docker-compose down • By default, docker-compose looks for the docker- compose.yml file in the current working directory; we can change the file with the configuration using the -f flag Fabiana Rossi - SABD 2019/20 19

  20. Docker Compose • There are different versions of the docker compose file format • Wewill use the version 3 , supported from Docker Compose 1.13 On the docker compose file format: https://docs.docker.com/compose/compose-file/ Fabiana Rossi - SABD 2019/20 20

  21. Example: Exclamation • Problem: Suppose to have a random source of words. Create a DSP application that adds two exclamation points to each word. Fabiana Rossi - SABD 2019/20 21

  22. Example: Exclamation • Problem: Suppose to have a random source of words. Create a DSP application that adds two exclamation points to each word. • Solution (1): Fabiana Rossi - SABD 2019/20 22

  23. A simple topology: ExclamationTopology ... TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("word", new RandomNamesSpout(), 1); builder.setBolt("exclaim1", new ExclamationBolt(), 1) .shuffleGrouping("word"); builder.setBolt("exclaim2", new ExclamationBolt(), 1) .shuffleGrouping("exclaim1"); Config conf = new Config(); conf.setNumWorkers(3); StormSubmitter.submitTopologyWithProgressBar( "ExclamationTopology", conf, builder.createTopology() ); ... Fabiana Rossi - SABD 2019/20 23

  24. Example: Exclamation • Problem: Suppose to have a random source of words. Create a DSP application that adds two exclamation points to each word. • Solution (2): Fabiana Rossi - SABD 2019/20 24

  25. Example: WordCount • Problem: Suppose to have a random source of sentences. Create a DSP application that counts the number of occurrences of each word. Fabiana Rossi - SABD 2019/20 25

  26. Example: WordCount • Problem: Suppose to have a random source of sentences. Create a DSP application that counts the number of occurrences of each word. • Solution: Fabiana Rossi - SABD 2019/20 26

  27. WordCount ... TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentenceBolt(), 8) .shuffleGrouping("spout"); builder.setBolt("count", new WordCountBolt(), 12) .fieldsGrouping("split", new Fields("word")); Config conf = new Config(); ... StormSubmitter.submitTopologyWithProgressBar( "WordCount", conf, builder.createTopology() ); ... Fabiana Rossi - SABD 2019/20 27

  28. Example: Rolling Count • Problem: Suppose to have a random source of words. Create a DSP application that determines the top-N rank of words within a sliding window of 9 secs and sliding interval of 3 secs. Fabiana Rossi - SABD 2019/20 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend