SLIDE 24 Example: WordCount in Java 7 (and earlier)
JavaRDD<String> textFile = sc.textFile("hdfs://..."); JavaRDD<String> words = textFile.flatMap(new FlatMapFunction<String, String>() { public Iterable<String> call(String s) { return Arrays.asList(s.split(" ")); } }); JavaPairRDD<String, Integer> pairs = words.mapToPair(new PairFunction<String, String, Integer>() { public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); } }); JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new Function2<Integer, Integer, Integer>() { public Integer call(Integer a, Integer b) { return a + b; } }); counts.saveAsTextFile("hdfs://...");
46
Spark’s Java API allows to create tuples using the scala.Tuple2 class Pair RDDs are RDDs containing key/value pairs
Valeria Cardellini - SABD 2019/2020
Example: WordCount in Java 8
- Example in Java 7: too verbose
- Support for lambda expressions from Java 8
– Anonymous methods (methods without names) used to implement a method defined by a functional interface – New arrow operator -> divides the lambda expressions in two parts
- Left side: parameters required by the lambda expression
- Right side: actions of the lambda expression
47
JavaRDD<String> textFile = sc.textFile("hdfs://..."); JavaPairRDD<String, Integer> counts = textFile .flatMap(s -> Arrays.asList(s.split(" ")).iterator()) .mapToPair(word -> new Tuple2<>(word, 1)) .reduceByKey((a, b) -> a + b); counts.saveAsTextFile("hdfs://...");
Valeria Cardellini - SABD 2019/2020