[S PARK ] Shrideep Pallickara Computer Science Colorado State - PDF document

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University CS 555: D ISTRIBUTED S YSTEMS [S PARK ] Shrideep Pallickara Computer Science Colorado State University CS555: Distributed Systems [Fall 2019] October 10, 2019 L14.1 Dept. Of Computer Science , Colorado State University Frequently asked questions from the previous class survey L14. 2 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L14.1 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Topics covered in this lecture ¨ Transformations and Actions ¤ RDDs ¤ DataFrames L14. 3 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA C OMMON TRANSFORMATIONS AND A CTIONS CS555: Distributed Systems [Fall 2019] October 10, 2019 L14.4 Dept. Of Computer Science , Colorado State University L14.2 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Element-wise transformations: filter() ¨ Takes in a function and returns an RDD that only has elements that pass the filter() function L14. 5 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Element-wise transformations: map() ¨ Takes in a function and applies it to each element in the RDD ¨ Result of the function is the new value of each element in the resulting RDD inputRDD {1,2,3,4} map x => x*x filter x => x !=1 Mapped RDD Filtered RDD {1, 4, 9, 16} {2,3,4} L14. 6 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L14.3 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Things that can be done with map() ¨ Fetch website associated with each URL in collection to just squaring numbers ¨ map() ’s return type does not have to be the same as its input type ¨ Multiple output elements for each input element? ¤ Use flatMap() lines=sc.parallelize([“hello world”, “hi”]) words=lines.flatMap(lambda line: line.split(“ “) ) words.first() # returns hello L14. 7 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Difference between map and flatMap mappedRDD RDD1.map(tokenize) {[“coffee”, “panda”], [“happy”, “panda”], [“happiest”, “panda”, “party”]} RDD1 {“coffee panda”, “happy panda”, “happiest panda party”} flatMappedRDD RDD1.flatMap(tokenize) {“coffee”, “panda”, “happy”, “panda”, “happiest”, “panda”, “party”} L14. 8 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L14.4 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Psuedo set operations ¨ RDDs support many of the operations of mathematical sets such as union, intersection, etc. ¤ Even when the RDDs themselves are not properly sets L14. 9 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Some simple set operations RDD1 RDD2 {coffee, coffee, panda, {coffee, monkey, kitty} monkey, tea} RDD1.union(RDD2) RDD1.distinct() RDD1.intersection(RDD2) {coffee, coffee, coffee, {coffee, monkey, {coffee, monkey} panda, monkey, monkey, panda, tea} tea, kitty} RDD1.subtract(RDD2) {panda, tea} L14. 10 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L14.5 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Cartesian product between two RDDs RDD1.cartesian(RDD2) RDD1 { (User1, Venue(“Betabrand”)), {User1, User2, User3} (User1,Venue(“Asha Tree House”)), (User1,Venue(“Ritual”)), (User2, Venue(“Betabrand”)), cartesian (User2,Venue(“Asha Tree House”)), (User2,Venue(“Ritual”)), (User3, Venue(“Betabrand”)), RDD2 (User3,Venue(“Asha Tree House”)), {Venue(“Betabrand”), (User3,Venue(“Ritual”)) } Venue(“Asha Tree House”), Venue(“Ritual”)} L14. 11 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA C OMMON A CTIONS CS555: Distributed Systems [Fall 2019] October 10, 2019 L14.12 Dept. Of Computer Science , Colorado State University L14.6 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Actions on Basic RDDs ¨ reduce() ¤ Takes a function that operates on two elements in the RDD; returns an element of the same type n E.g. of such an operation? + sums the RDD sum = rdd.reduce(lambda x, y: x+ y) ¨ fold() takes a function with the same signature as reduce() , but also takes a “zero value” for initial call ¤ “Zero value” is the identity element for initial call ¤ E.g., 0 for +, 1 for *, empty list for concatenation L14. 13 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Both fold() and reduce() require return type to be of the same type as the RDD elements ¨ The aggregate() removes that constraint ¤ For e.g. when computing a running average, maintain both the count so far and the number of elements L14. 14 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L14.7 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University E XAMPLES : B ASIC A CTIONS ON RDD S CS555: Distributed Systems [Fall 2019] October 10, 2019 L14.15 Dept. Of Computer Science , Colorado State University Examples: Basic actions on RDDs [1/7] ¨ Our RDD contains {1, 2, 3, 3} ¨ collect() ¤ Return all elements from the RDD ¤ Invocation: rdd.collect() ¤ Result: {1, 2, 3, 3} L14. 16 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L14.8 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Examples: Basic actions on RDDs [2/7] ¨ Our RDD contains {1, 2, 3, 3} ¨ count() ¤ Number of elements in the RDD ¤ Invocation: rdd.count() ¤ Result: 4 L14. 17 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Examples: Basic actions on RDDs [3/7] ¨ Our RDD contains {1, 2, 3, 3} ¨ countByValue() ¤ Number of times each element occurs in the RDD ¤ Invocation: rdd.countByValue() ¤ Result: { (1,1), (2,1), (3,2) } L14. 18 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L14.9 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Examples: Basic actions on RDDs [4/7] ¨ Our RDD contains {1, 2, 3, 3} ¨ take(num) ¤ Return num elements from the RDD ¤ Invocation: rdd.take(2) ¤ Result: { 1, 2} L14. 19 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Examples: Basic actions on RDDs [5/7] ¨ Our RDD contains {1, 2, 3, 3} ¨ reduce(func) ¤ Combine the elements of the RDD together in parallel ¤ Invocation: rdd.reduce( (x,y) => x + y ) ¤ Result: 9 L14. 20 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L14.10 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University Examples: Basic actions on RDDs [6/7] ¨ Our RDD contains {1, 2, 3, 3} ¨ aggregate(zeroValue)(seqOp, combOp) ¤ Similar to reduce() but used to return a different type ¤ Invocation: n rdd.aggregate((0,0)) (x,y) => (x._1 + y, x._2 +1), (x,y) => (x._1 + y._1, x._2 + y._2)) ¤ Result: (9, 4) L14. 21 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA Examples: Basic actions on RDDs [7/7] ¨ Our RDD contains {1, 2, 3, 3} ¨ foreach(func) ¤ Apply the provided function to each element of the RDD ¤ Invocation: rdd.foreach(func) ¤ Result: Nothing L14. 22 CS555: Distributed Systems [Fall 2019] October 10, 2019 Dept. Of Computer Science , Colorado State University Professor: S HRIDEEP P ALLICKARA L14.11 S LIDES C REATED B Y : S HRIDEEP P ALLICKARA

[S PARK ] Shrideep Pallickara Computer Science Colorado State - PDF document

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University CS 555: D ISTRIBUTED S YSTEMS [S PARK ] Shrideep Pallickara Computer Science Colorado State University CS555: Distributed Systems [Fall 2019] October

Liberty State Park Park Interior WRT Liberty State Park Today Liberty State Park The Park

Tyrol Hill Park Phase 4 Elementary Campbell Elementary Campbell Park Spaces Open Park

OPERA BROGLIE CAR PARK 1 OPERA BROGLIE CAR PARK 2 OPERA BROGLIE CAR PARK 3 OPERA BROGLIE CAR

HADEN PARK W I T T E R D . . D R T N I O P G N O L COMMUNITY PARK NATURE PARK

Pleasant Valley Parks Bower Park Camp Nooteeming Helen Aldrich Park Pleasant Valley Parade

Project Area Vilas Park Vilas Park Master Plan Vilas Park Master Plan Plan Maestro De Vilas Park

Addison Circle District Park Statistics 5 Parks Addison Circle Park Beckert Park

FAIRVIEW PARK By: Hannah S. ABOUT FAIRVIEW PARK Fairview Park is the best city I have ever

LACAMAS PARK (ROUND LAKE PARK) FALLEN LEAF LAKE PARK (DEAD LAKE) LACAMAS PARK MAPPING &

ACTIVATION ENGAGEMENT INVESTMENT PARK DESIGN SERVICES Park Design Services Offered by Austin

ARMY ARMY ARMY ARMY Timothy Park Timothy Park Timothy Park Timothy Park 2LT MC USAR 2LT MC

FAIRVIEW PARK PC17-PR-002 Fairview Park (south) Rt. 50 Fairview Park (north) I-495 Callison

BROOKLYN BRIDGE PARK A. PROjEct OVERVIEW B. PARK DESIGN c. BUILDING tHE PARK

BASS LAKE BASS LAKE REGIONAL PARK REGIONAL PARK EL DORADO COUNTY PARKS EL DORADO COUNTY PARKS

SUNNYNOOK RIVER PARK SUNNYNOOK RIVER PARK SUNNYNOOK RIVER PARK Community Presentation - - July

Bushy Park Industrial Complex Bushy Park Industrial Complex Bushy Park Industrial Complex

Threshold Schnorr with Stateless Deterministic Signing Franois Garillot, Yash vanth Kondi,

Command command / kmand / noun 1. an instruction or signal that causes a computer to

Evolutionary Optimization of Circuits www.cercia.ac.uk Thorsten Schnier Cercia A cheery quote

From Metaobject Protocols to Versatile Kernels for AOP ric T anter PhD Thesis Defense

CPSC 213 Switch Statements, Understanding Pointers - 2nd ed: 3.6.7, 3.10 - 1st ed: 3.6.6,

MOBILE APP DEVELOPMENT MOBILE APP DEVELOPMENT WITH A SUPERPOWERED PLATFORM WITH A SUPERPOWERED

CEE 772 Xian Ma 1, Introduction What is IC? What are their names of each part? 2, Principle

Abdelrahman Ibrahim @iaboeyad Stanford Drupal Camp 2017 Saudi Arabia Problem Around 2 events

[S PARK ] Shrideep Pallickara Computer Science Colorado State - PDF document

CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science , Colorado State University CS 555: D ISTRIBUTED S YSTEMS [S PARK ] Shrideep Pallickara Computer Science Colorado State University CS555: Distributed Systems [Fall 2019] October

Liberty State Park Park Interior WRT Liberty State Park Today Liberty State Park The Park

Tyrol Hill Park Phase 4 Elementary Campbell Elementary Campbell Park Spaces Open Park

OPERA BROGLIE CAR PARK 1 OPERA BROGLIE CAR PARK 2 OPERA BROGLIE CAR PARK 3 OPERA BROGLIE CAR

HADEN PARK W I T T E R D . . D R T N I O P G N O L COMMUNITY PARK NATURE PARK

Pleasant Valley Parks Bower Park Camp Nooteeming Helen Aldrich Park Pleasant Valley Parade

Project Area Vilas Park Vilas Park Master Plan Vilas Park Master Plan Plan Maestro De Vilas Park

Addison Circle District Park Statistics 5 Parks Addison Circle Park Beckert Park

FAIRVIEW PARK By: Hannah S. ABOUT FAIRVIEW PARK Fairview Park is the best city I have ever

LACAMAS PARK (ROUND LAKE PARK) FALLEN LEAF LAKE PARK (DEAD LAKE) LACAMAS PARK MAPPING &amp;

ACTIVATION ENGAGEMENT INVESTMENT PARK DESIGN SERVICES Park Design Services Offered by Austin

ARMY ARMY ARMY ARMY Timothy Park Timothy Park Timothy Park Timothy Park 2LT MC USAR 2LT MC

FAIRVIEW PARK PC17-PR-002 Fairview Park (south) Rt. 50 Fairview Park (north) I-495 Callison

BROOKLYN BRIDGE PARK A. PROjEct OVERVIEW B. PARK DESIGN c. BUILDING tHE PARK

BASS LAKE BASS LAKE REGIONAL PARK REGIONAL PARK EL DORADO COUNTY PARKS EL DORADO COUNTY PARKS

SUNNYNOOK RIVER PARK SUNNYNOOK RIVER PARK SUNNYNOOK RIVER PARK Community Presentation - - July

Bushy Park Industrial Complex Bushy Park Industrial Complex Bushy Park Industrial Complex

Threshold Schnorr with Stateless Deterministic Signing Franois Garillot, Yash vanth Kondi,

Command command / kmand / noun 1. an instruction or signal that causes a computer to

Evolutionary Optimization of Circuits www.cercia.ac.uk Thorsten Schnier Cercia A cheery quote

From Metaobject Protocols to Versatile Kernels for AOP ric T anter PhD Thesis Defense

CPSC 213 Switch Statements, Understanding Pointers - 2nd ed: 3.6.7, 3.10 - 1st ed: 3.6.6,

MOBILE APP DEVELOPMENT MOBILE APP DEVELOPMENT WITH A SUPERPOWERED PLATFORM WITH A SUPERPOWERED

CEE 772 Xian Ma 1, Introduction What is IC? What are their names of each part? 2, Principle

Abdelrahman Ibrahim @iaboeyad Stanford Drupal Camp 2017 Saudi Arabia Problem Around 2 events

LACAMAS PARK (ROUND LAKE PARK) FALLEN LEAF LAKE PARK (DEAD LAKE) LACAMAS PARK MAPPING &