ghc18 dropbox full text full search
play

#GHC18 DROPBOX FULL- TEXT FULL SEARCH PAGE 2 | GRACE HOPPER - PowerPoint PPT Presentation

N A U T I L U S : L E S S O N S F R O M B U I L D I N G D R O P B O X S L A R G E S C A L E , D I S T R I B U T E D S E A R C H E N G I N E Samantha Steele | sammyst@dropbox.com #GHC18 DROPBOX FULL- TEXT FULL SEARCH PAGE 2 |


  1. N A U T I L U S : L E S S O N S F R O M B U I L D I N G D R O P B O X ’ S L A R G E S C A L E , D I S T R I B U T E D S E A R C H E N G I N E Samantha Steele | sammyst@dropbox.com #GHC18

  2. DROPBOX FULL- TEXT FULL SEARCH PAGE 2 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  3. REPLACING DROPBOX’S LEGACY SEARCH SYSTEM #GHC18 • Built in 2013, served Dropbox well for many years • Only scaled to paid users • Not fmexible for fast experimentation • Didn’t support modern search features (I18N Support, Snippets) PAGE 3 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  4. ELASTICSEARCH EVALUATION #GHC18 15 Clusters, each composed of: • 5 Master hosts • 50 Data hosts, each running 2 ElasticSearch data nodes • 150 shards with 2x replication for a total of 300 shards. PAGE 4 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  5. ELASTICSEARCH EVALUATION #GHC18 Single cluster P95 Latency corresponds to p75 end to end latency PAGE 5 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  6. NAUTILUS GOALS • Scale to all 500 million #GHC18 users • Low-latency, 350ms p95 • High availability • Support fast experimentation • Incremental fjeld updates for metadata indexing PAGE 6 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  7. NAUTILUS ARCHITECTURE #GHC18 PAGE 7 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  8. NAUTILUS ENGINE #GHC18 Nautilus Partitioning Document Sharing within a Partition PAGE 8 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  9. INFORMATION RETRIEVAL 101 #GHC18 DocID 1, Metadata: T erm Frequency = 152 Positions = 5, 12, 22… Single Posting list Entry Inverted Search Index PAGE 9 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  10. INVERTED INDEX BUILT ON ROCKSDB #GHC18 PAGE 10 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  11. OPTIMIZING ROCKSDB PERFORMANCE #GHC18 Old Posting List Format New Exploded Posting List PAGE 11 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  12. OPTIMIZING CGO PERFORMANCE #GHC18 • C Memory != Go Memory • Cgoroutines != Goroutines • C to go calls are very expensive PAGE 12 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  13. DEBUGGING AVAILABILITY ISSUES #GHC18 # of partitions served drops as processes repeatedly OOM PAGE 13 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  14. OOM CASE STUDY 1 Classic Memory Leak profjle #GHC18 Inverted Search Index PAGE 14 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  15. MEMORY USAGE ON ANOTHER MACHINE Infmection Point #GHC18 PAGE 15 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  16. ROOT CAUSE #GHC18 PAGE 16 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  17. OOM CASE STUDY 2 Sudden Memory Spikes #GHC18 PAGE 17 | GRACE HOPPER CELEBRATION 2018 #GHC 18 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  18. CONCLUSIONS #GHC18 Nautilus launched to 100% in June Steady state 99.9% availability 300ms latency p95 Coming soon: Better I18N Support, Snippets and more PAGE 18 | GRACE HOPPER CELEBRATION 2018 PRESENTED BY ANITAB.ORG AND THE ASSOCIATION FOR COMPUTING MACHINERY

  19. Feel free to reach out: sammyst@dropbox.com #GHC18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend