hashcache cache storage for the next billion

HashCache: Cache Storage for the Next Billion Anirudh Badam - PowerPoint PPT Presentation

HashCache: Cache Storage for the Next Billion Anirudh Badam KyoungSoo Park Vivek S. Pai Larry L. Peterson Princeton University 1 Next Billion Internet Users 2 Next Billion Internet Users Schools, urban middle class in developing


  1. HashCache: Basic Policy URL hash_value H Bits % N t head t th block data N contiguous blocks Circular (Disk Table) Filesystem Log 10

  2. HashCache: Basic Policy URL hash_value H Bits % N t head t th block N contiguous blocks Circular (Disk Table) Filesystem Log 10

  3. HashCache: Basic Policy URL hash_value H Bits % N t head t th block N contiguous blocks Circular (Disk Table) Filesystem Log 10

  4. HashCache: Basic Policy • Advantages • No index memory needed • Tuned for one seek for most objects 10

  5. HashCache: Basic Policy • Advantages • No index memory needed • Tuned for one seek for most objects • Disadvantages • One seek per miss • No collison control • No cache replacement policy 10

  6. Collision Control 11

  7. Collision Control 11

  8. Collision Control Chaining 11

  9. Collision Control • Does not transition well to disk-based • Multiple seeks per operation • Walking hash bin list • Global replacement policy crosses bins Chaining 11

  10. Collision Control Chaining 11

  11. Collision Control • Fixed locations where each object can be found • Allocated contiguously, read together Set Associativity T -Ways 11

  12. Collision Control • Fixed locations where each object can be found • Allocated contiguously, read together • Seek time dominates short read Set Associativity T -Ways 11

  13. Collision Control • Fixed locations where each object can be found • Allocated contiguously, read together • Seek time dominates short read • Eliminate global cache Set Associativity replacement policies T -Ways 11

  14. Reducing Seeks 12

  15. Reducing Seeks • In-memory hash table • Too much memory for Bin Pointers 32 pointers Chaining Pointers 64 Hash 32 Total (bits) 128 In-memory Hash Table 12

  16. Reducing Seeks • In-memory hash table • Too much memory for pointers • Disk is already a hash table • Pointers not needed • Large bitmap with the same layout as the disk Disk Table 12

  17. Reducing Seeks Disk Block H Bits • In-memory hash table • Too much memory for pointers • Disk is already a hash table • Pointers not needed • Large bitmap with the same layout as the disk • Just store hash per URL In-memory Disk Table Bitmap 12

  18. Reducing Seeks Disk Block H Bits • In-memory hash table • Too much memory for pointers • Disk is already a hash table • Pointers not needed • Large bitmap with the same layout as the disk • Just store hash per URL In-memory Disk Table Bitmap 12

  19. Reducing Seeks • Original hash of the URL: Original 64 64 bits Hash 12

  20. Reducing Seeks • Original hash of the URL: Original 64 64 bits Hash • Eliminate bits for (same) 39 64 - log(S) bin # (2 28 objs, 8-way, #bins=2 25 (S)) 12

  21. Reducing Seeks • Original hash of the URL: Original 64 64 bits Hash • Eliminate bits for (same) 39 64 - log(S) bin # (2 28 objs, 8-way, #bins=2 25 (S)) 8 low FP hash • Shrink hash size: Just to eliminate most false positives (8 bits) 12

  22. Cache Replacement Original 64 Hash 39 64 - log(S) 8 low FP hash 13

  23. Cache Replacement • Large disks: 10-100+ million objects • Global caching relevant when Original 64 Hash disk size working set ≈ • When disk >> working set, 39 64 - log(S) local policies global policies ≈ 8 low FP hash 13

  24. Cache Replacement • Large disks: 10-100+ million objects • Global caching relevant when Original 64 Hash disk size working set ≈ • When disk >> working set, 39 64 - log(S) local policies global policies ≈ • Local replacement benefits 8 low FP hash • 3 bits per URL 11 hash + rank • Performed on contiguous objects • False positives limited by set size 13

  25. HashCache: SetMem Policy 14

  26. HashCache: SetMem Policy URL 14

  27. HashCache: SetMem Policy URL data 14

  28. HashCache: SetMem Policy URL hash_value % S t data 14

  29. HashCache: SetMem Policy URL hash_value % S t head data Memory Filesystem 14

  30. HashCache: SetMem Policy URL hash_value % S 11 Bits t head t th set t th set data Memory Filesystem 14

  31. HashCache: SetMem Policy URL hash_value % S 11 Bits t head t th set t th set data Memory Filesystem 14

  32. HashCache: SetMem Policy URL hash_value % S 11 Bits t head t th set t th set Memory Filesystem 14

  33. HashCache: SetMem Policy URL LRU hash_value LRU % S 11 Bits t head t th set t th set Memory Filesystem 14

  34. HashCache: SetMem Policy • Advantages • No seeks for most misses • 1 seek per read, 1 seek per write • Good hash + replacement in 11 bits 14

  35. HashCache: SetMem Policy • Advantages • No seeks for most misses • 1 seek per read, 1 seek per write • Good hash + replacement in 11 bits • Disadvantages • Writes still need seeks 14

  36. Further Reducing Seeks Disk Table 15

  37. Further Reducing Seeks • Storing objects by hash can produce random reads & writes Disk Table 15

  38. Further Reducing Seeks • Storing objects by hash can produce random reads & writes • Restructure on-disk table Disk Table 15

Recommend


More recommend