Analyzing Pwned Passwords with Spark Kelley Robinson - - PowerPoint PPT Presentation

analyzing pwned passwords with spark
SMART_READER_LITE
LIVE PREVIEW

Analyzing Pwned Passwords with Spark Kelley Robinson - - PowerPoint PPT Presentation

Analyzing Pwned Passwords with Spark Kelley Robinson @kelleyrobinson Developer Evangelist + @KELLEYROBINSON BIG DATA & SECURITY Spark: then and now The state of passwords Spark in action Big Data Security BIG DATA & SECURITY


slide-1
SLIDE 1
slide-2
SLIDE 2

Analyzing Pwned Passwords with Spark

Kelley Robinson

@kelleyrobinson

Developer Evangelist

slide-3
SLIDE 3
slide-4
SLIDE 4

+

slide-5
SLIDE 5 BIG DATA & SECURITY @KELLEYROBINSON

Spark: then and now The state of passwords Spark in action Big Data ∩ Security

slide-6
SLIDE 6 BIG DATA & SECURITY @KELLEYROBINSON
slide-7
SLIDE 7 BIG DATA & SECURITY @KELLEYROBINSON

Apache Spark Ecosystem

slide-8
SLIDE 8 BIG DATA & SECURITY @KELLEYROBINSON

Spark Abstractions

Then Now

RDD (Resilient Distributed Dataset) DataFrames / Datasets

slide-9
SLIDE 9 https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html @KELLEYROBINSON BIG DATA & SECURITY

RDDs

  • Immutable & distributed

collection

  • Unstructured data
  • Low-level transformation

and control

slide-10
SLIDE 10 BIG DATA & SECURITY https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html @KELLEYROBINSON
slide-11
SLIDE 11 https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html @KELLEYROBINSON BIG DATA & SECURITY

Datasets

  • Structured data
  • Strongly typed
  • Fast
slide-12
SLIDE 12 @KELLEYROBINSON BIG DATA & SECURITY

Datasets

  • Structured data
  • Strongly typed
  • Fast
  • SQL DSLs
slide-13
SLIDE 13 BIG DATA & SECURITY @KELLEYROBINSON

Apache Spark Ecosystem

slide-14
SLIDE 14 BIG DATA & SECURITY @KELLEYROBINSON

Scala has the most robust language API

slide-15
SLIDE 15 BIG DATA & SECURITY https://www.slideshare.net/databricks/composable-parallel-processing-in-apache-spark-and-weld @KELLEYROBINSON
slide-16
SLIDE 16 BIG DATA & SECURITY https://twitter.com/CamJo89/status/996497423621996544 @KELLEYROBINSON
slide-17
SLIDE 17 BIG DATA & SECURITY @KELLEYROBINSON

Spark: then and now The state of passwords Spark in action Big Data ∩ Security

slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21 @KELLEYROBINSON BIG DATA & SECURITY

Spark: then and now The state of passwords Spark in action Big Data ∩ Security

slide-22
SLIDE 22 https://twitter.com/dog_rates/status/986762231290490881
slide-23
SLIDE 23

Benefits

Fast Flexible Good for exploration Proven for large systems

BIG DATA & SECURITY @KELLEYROBINSON
slide-24
SLIDE 24

Challenges

Opaque error messages Operationalizing Documentation

http://heather.miller.am/blog/launching-a-spark-cluster-part-1.html BIG DATA & SECURITY @KELLEYROBINSON
slide-25
SLIDE 25
slide-26
SLIDE 26 BIG DATA & SECURITY https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/ @KELLEYROBINSON

👎💰

The missing Spark documentation

slide-27
SLIDE 27 BIG DATA & SECURITY @KELLEYROBINSON

Spark: then and now The state of passwords Spark in action Big Data ∩ Security

slide-28
SLIDE 28 BIG DATA & SECURITY @KELLEYROBINSON
slide-29
SLIDE 29 @KELLEYROBINSON
slide-30
SLIDE 30 BIG DATA & SECURITY
slide-31
SLIDE 31

THANK YOU!

@kelleyrobinson

slide-32
SLIDE 32

Spark Resources

  • Apache Spark
  • Jacek's Spark Documentation
  • Zeppelin
  • RDDs vs. Datasets
  • Running Spark on a Cluster

Security Resources

  • Pwned Passwords
  • Reverse SHA1 hashes
  • LastPass and 1Password
  • 2FA Guides
@KELLEYROBINSON BIG DATA & SECURITY
slide-33
SLIDE 33