eivind arvesen javazone 2019 e i v i n d a r v e s e n
play

Eivind Arvesen, Javazone 2019 E I V I N D A R V E S E N Developer, - PowerPoint PPT Presentation

Eivind Arvesen, Javazone 2019 E I V I N D A R V E S E N Developer, architect M.Sc. in Computer Science Consultant at Bouvet since 2017 Into application security, privacy, machine learning, web development / :


  1. Eivind Arvesen, Javazone 2019

  2. E I V I N D A R V E S E N • Developer, architect • M.Sc. in Computer Science • Consultant at Bouvet since 2017 • Into application security, privacy, 
 machine learning, web development / : @EivindArvesen

  3. E I V I N D A R V E S E N • Developer, architect • M.Sc. in Computer Science • Consultant at Bouvet since 2017 • Into application security, privacy, 
 machine learning, web development / : @EivindArvesen

  4. @EivindArvesen E I V I N D A R V E S E N • Developer, architect • M.Sc. in Computer Science • Consultant at Bouvet since 2017 • Into application security, privacy, 
 machine learning, web development / : @EivindArvesen

  5. P E R S O N A L D ATA I N A P P E N D - O N LY S T O R A G E P O S S I B LY N O T A G R E AT I D E A / : @EivindArvesen

  6. * C O N T E X T * / : @EivindArvesen

  7. Elasticsearch: an open source, near realtime distributed search engine with a REST-API. / : @EivindArvesen

  8. Elasticsearch should not be used as a primary data store ! / : @EivindArvesen

  9. Elasticsearch is great at search / : @EivindArvesen

  10. … but it is not a database / : @EivindArvesen

  11. P H I L A D E L P H I A / : @EivindArvesen

  12. E L A S T I C S E A R C H I N D E P T H Source: https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up / : @EivindArvesen

  13. E L A S T I C S E A R C H I N D E P T H Source: https://github.com/exo-archives/exo-es-search / : @EivindArvesen

  14. E L A S T I C S E A R C H I N D E P T H Source: https://www.elastic.co/blog/every-shard-deserves-a-home / : @EivindArvesen

  15. E L A S T I C S E A R C H I N D E P T H Source: https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/ / : @EivindArvesen

  16. E L A S T I C S E A R C H I N D E P T H Merging can be performed manually, but this should only be done on old indices that are no longer in active use. It merges everything into one segment – no further automatic optimization. / : @EivindArvesen

  17. E L A S T I C S E A R C H I N D E P T H / : @EivindArvesen

  18. C O N T E X T « … B U T I T I S N ’ T A C T U A L LY D E L E T E D U N T I L A S E G M E N T M E R G E O C C U R S » / : @EivindArvesen

  19. / : @EivindArvesen

  20. G D P R / : @EivindArvesen

  21. G D P R A R T. 1 7 - R I G H T T O E R A S U R E ( « R I G H T T O B E F O R G O T T E N » ) The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay… / : @EivindArvesen

  22. G D P R « U N D U E D E L AY » Considered «about a month» (EU), or 
 «thirty days» (Information Commissioner’s Office, UK) / : @EivindArvesen

  23. W H AT I S « E R A S U R E » ? What does it mean to «erase»? / : @EivindArvesen

  24. A F F E C T E D ? Depending upon • Cluster architecture • Difference in data between shards on different nodes • Configuration (e.g. refresh-interval) • Merge settings* • Whether a new search (via side effects) leads to a «flush», which in turn leads to a merge one can at any given point in time be in possession of data that should be deleted. / : @EivindArvesen

  25. A F F E C T E D ? …and when a segment reaches the maximum size (5GB by default), it can only* be merged when it accumulates 50% deletions! * Lucene < 7.5 / : @EivindArvesen

  26. P R O B L E M • No obvious solution • Uncertain whether it is a problem in practice until an EU court takes a position / : @EivindArvesen

  27. W E D O N ’ T K N O W W H AT D ATA W E H AV E / : @EivindArvesen

  28. N O W W H AT ? / : @EivindArvesen

  29. Elasticsearch should not be used as a primary data store ! / : @EivindArvesen

  30. … but many do it anyway! / : @EivindArvesen

  31. C O M M U N I C AT I O N S • Blog • Elastic • Lucene / : @EivindArvesen

  32. C O M M U N I C AT I O N S Lucene 7.5 would be released in about a week (Thanks, Jan Høydal!) / : @EivindArvesen

  33. C O M M U N I C AT I O N S Current ES version: 7.3 (July 2019) ES version < 6.5 does not have Lucene 7.5, and cannot be configured to the extent we need / : @EivindArvesen

  34. S O L U T I O N ! / : @EivindArvesen

  35. S O L U T I O N U P G R A D E + B L U E / G R E E N D E P L O Y M E N T E S > = 6 . 5 ( L U C E N E > = 7 . 5 ) / : @EivindArvesen

  36. S O L U T I O N U P G R A D E + B L U E / G R E E N D E P L O Y M E N T A N O T H E R S O U R C E O F G R O U N D T R U T H / : @EivindArvesen

  37. S O L U T I O N Source: https://www.elastic.co/blog/signal-media-optimizing-for-more-elasticsearch-power-with-less-elasticsearch-cluster / : @EivindArvesen

  38. A LT E R N AT I V E S O L U T I O N PAT C H I N G + W E E K LY J O B Cron expungeDelete / : @EivindArvesen

  39. O N LY E L A S T I C S E A R C H ? • Probably also affects SOLR (and other comparable solutions) • Kafka? / : @EivindArvesen

  40. C O N C L U S I O N S • DON’T use Elasticsearch as primary data store! • If this strikes you as a particularly relevant risk: • Get legal advice • Upgrade your Elasticsearch version • Read up on configs • Read up on how to reindex in place (periodically) • Establish a cleaning job • … or encrypt (hard) and «throw away» the key / : @EivindArvesen

  41. T L D R • L U C E N E < 7 . 5 W O N ' T M E R G E S E G M E N T S L A R G E R T H A N 5 G B ( D E FA U LT ) U N L E S S T H E Y A C C U M U L AT E 5 0 % D E L E T I O N S . • Y O U S H O U L D R E I N D E X F R O M P R I M A RY D ATA S T O R E P E R I O D I C A L LY / : @EivindArvesen

  42. I N S U M M A RY / : @EivindArvesen

  43. E L A S T I C S E A R C H Y O U K N O W, F O R S E A R C H / : @EivindArvesen

  44. C AT C H M E O U T S I D E • @EivindArvesen • htttps://github.com/eivindarvesen • https://eivindarvesen.com Illustrations: Unsplash / : @EivindArvesen

  45. T H A N K Y O U !

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend