Stores at t Scale: A Tencent Case Stu tudy Ke Zhou, Si Sun , Hua - PowerPoint PPT Presentation

Demystify fying Cache Policies for Photo Stores at t Scale: A Tencent Case Stu tudy Ke Zhou, Si Sun , Hua Wang, Ping Huang, Xubin He, Rui Lan, Wenyan Li, Wenjie Liu, Tianming Yang Huazhong University of Science and Technology Key Laboratory of Information Storage System Intelligent Cloud Storage Joint Research Center of HUST and Tencent Tencent Inc. Temple University Huanghuai University 1

Outline ◼ Background ◼ The failure of cache policies ◼ Motivation ◼ Prefetching ◼ Performance ◼ Conclusion 2

Background ◼ More than 250 million photos uploaded in QQphoto every day ◼ Total photo view per day approaches to 50 billion ◼ QQphoto faces critical challenges of dealing with such huge mounts of photos • user experiences(needs lower latency) • backend storage burden(needs lower traffic) 3

The photo cache architecture 4

Upload and download 5

Upload channel ◼ Directly write to backend storage original photo : users upload Users/Apps physical photo : original photo and photos upload channel resized from original difference in format or specification … logical photo : a photo set Backend Storage System that containing several physical photos sharing : resize mechanism the same content 6

Two-tier cache Users/Apps where we delve into 7

Outside cache • 9-days logs • >5.8 billion requests • >801 million logical photos • >1.5 billion physical photos • total data size >46 TB • total network traffic >186 TB ◼ Sampling based on logical photos • Extract all logical photos in logs • Random sampling the logical photos by 1:100 • Extract logs containing the sampled logical photos 8

Advanced algorithms fail ◼ ARC, MQ, S3LRU are almost identical and show negligible improvements over LRU X is the cache capacity in production. Belady is theoretical optimal algorithms. 10

Advanced algorithms fail ◼ Phenomenon: • Higher frequency photos contribute more to HR (hit ratio) • Or lower frequency photos are more difficult to hit The CDFs of photo reuse distance grouped by photo frequency. 11

Advanced algorithms fail 100% 90% f = 1 80% f = 2 70% Percentage 2 < f ≤ 5 60% 5 < f ≤ 10 50% 40% 10 < f ≤ 100 30% 100 < f ≤ 1000 20% 1000 < f ≤ 10000 10% f > 10000 0% PoP PoR CtoHR PoP : percentage of photos remove compulsory miss PoR : percentage of requests CoHR : contribution to hit ratio 𝐷𝑝𝐼𝑆 = 𝑏𝑑𝑑𝑓𝑡𝑡 𝑢𝑗𝑛𝑓𝑡 𝑗𝑜 𝑕𝑠𝑝𝑣𝑞 − 𝑜𝑣𝑛 𝑝𝑔 𝑞ℎ𝑝𝑢𝑝𝑡 𝑗𝑜 𝑕𝑠𝑝𝑣𝑞 𝑏𝑑𝑑𝑓𝑡𝑡 𝑢𝑗𝑛𝑓𝑡 𝑗𝑜 𝑢𝑠𝑏𝑑𝑓 12

Hit ratio contribution breakdown CDF of CtoHR 90% 80% 76.82% 70% 67.90% 60% Hit ratio 50% 40% LRU 30% Belady 20% 10% 0% At cache capacity of X, HR(of LRU) is 67.9% At infinite cache capacity, HR(of Belady) is 76.82% 13

Hit ratio contribution breakdown CDF of CtoHR 90% 80% 76.82% 70% 67.90% 60% Hit ratio 50% 40% LRU 30% Belady 20% 10% 0% ◼ To improve hit ratio, low frequency photos must be hit ◼ Advanced algorithms do no optimization for low frequency data thus they fail to improve 14

Cache size is too large 𝑏𝑤𝑓𝑠𝑏𝑕𝑓 𝑣𝑞𝑚𝑝𝑏𝑒 𝑞ℎ𝑝𝑢𝑝𝑡 𝑞𝑓𝑠 𝑒𝑏𝑧 = 𝑢𝑝𝑢𝑏𝑚 𝑒𝑏𝑢𝑏 𝑡𝑗𝑨𝑓 𝑜𝑣𝑛 𝑝𝑔 𝑒𝑏𝑧𝑡 = 46TB ≈ 5.1TB 9 ◼ The cache capacity in production is 5TB! Cache size is large enough to hold all uploaded data! 15

Motivation 100% 90% f = 1 80% f = 2 70% Percentage 2 < f ≤ 5 60% 5 < f ≤ 10 50% 40% 10 < f ≤ 100 30% 100 < f ≤ 1000 20% 1000 < f ≤ 10000 10% f > 10000 0% PoP PoR CtoHR ◼ “Cold” photos( 𝑔𝑠𝑓𝑟 ≤ 5 ) accounts for the vast majority. Can we leverage those cold photos? Yes, make compulsory miss hit! 17

Immediacy ◼ Hint: “ Immediacy ” of social network • Recently uploaded photos are more likely to be requested by users The CDF of interval between photos uploading time and their first request time. 18

Immediacy ◼ More than 90% photos will be requested at least one time within 1 day following their uploading • If placing uploaded photos into cache in time, their compulsory miss will be eliminated → Prefetching • The prefetching is very efficient. 19

Prefetching ~73% >99% miss prefetching ◼ Prefetching • If prefetching photos uploaded every 1 second, nearly all compulsory miss of them will be eliminated • If prefetching every 10 min, about 73% compulsory miss of them will be eliminated • …… 20

Prefetch architecture Request Trigger periodically OC Prefetcher isolated module besides OC Request, pass to DC 22

Which resolution to prefetch ◼ What to prefetch • Original photos are resized to varying physical photos with various resolutions( 𝑆𝑓𝑨1, 𝑆𝑓𝑨2, 𝑆𝑓𝑨3 … ) • Prefetching the needed resolutions We know what content (logical photos) the users need. But we do not know which resolutions(physical photos) they need! A logical photo contains several physical photos 23

Which resolution to prefetch ◼ Problem: • Which resolution being requested is unknown ◼ Intuition: • Prefetch more popular resolutions • The more resolutions prefetching, the higher chances of eliminating compulsory miss 24

Which resolution to prefetch ◼ If frequency of 𝑆𝑓𝑨1 > 𝑆𝑓𝑨2 , 𝑆𝑓𝑨1 has higher priority to be prefetched ◼ 𝑶𝑸𝑺 (number of prefetching resolutions) • Control how many resolutions to be prefetched • E.g. 𝑂𝑄𝑆 = 2 indicates prefetching both 𝑆𝑓𝑨1 and 𝑆𝑓𝑨2 25

When to prefetch ◼ When to prefetch • How long to perform a prefetching. uploading time or here? or here? prefetching here? 26

Prefetching Scheduling ◼ QQphoto service is 24x7 online service ◼ Prefetching should also be online • Triggered periodically: prefetching interval • On a prefetching, all photos uploaded during last period should be prefetched trigger prefetching time prefetching photos uploaded during this time 27

Inserting to cache queue ◼ Inserted prefetched photos into cache queue the same as general replaced photos Insert new photos MRU LRU LRU queue Insert prefetched photos 28

Evaluation ◼ Setup: • a simulator • replaying trace • warming up: first 5 days • collect statistics: last 4 days • evaluating FIFO, LRU, S3LRU, Belady(offline optimal) • prefetching: • NPR: 1-8 • prefetching interval: 1 sec, 10 min, 1 hour 30

Hit ratio- NPR impact 𝐵𝑚𝑕𝑝𝑠𝑗𝑢ℎ𝑛𝑡 = LRU, 𝑂𝑄𝑆 = 1, … , 8, 𝑗𝑜𝑢𝑓𝑠𝑤𝑏𝑚 = 10𝑛𝑗𝑜 More NPRs rewards higher hit ratio 31

Hit ratio-prefetch interval impact exceed Belady 𝐵𝑚𝑕𝑝𝑠𝑗𝑢ℎ𝑛𝑡 = LRU, 𝑂𝑄𝑆 = 3, 𝑗𝑜𝑢𝑓𝑠𝑤𝑏𝑚 = 1𝑡, 10𝑛, 1ℎ Lower prefetch interval rewards higher hit ratio 32

Latency 𝑂𝑄𝑆 = 1, … , 8, 𝑗𝑜𝑢𝑓𝑠𝑤𝑏𝑚 = 10𝑛 Latency ∝ Hit ratio 🙃 Higher NPR → higher HR → lower latency 33

Network Traffic 𝑂𝑄𝑆 = 1, … , 8, 𝑗𝑜𝑢𝑓𝑠𝑤𝑏𝑚 = 10𝑛 🙂 Increase of NPRs result in huge growth of network traffic. 34

Latency and Network Traffic Trade-off ◼ Best NPR? • 🙃 : lower latency • 🙂 : more network traffic Network traffic and latency trade-offs at cache capacity of X Network traffic Latency 200% 150% best choices Percentage reduce latency by 6.9% 100% consumes 4.14% extra network resources. 50% 0% 1 2 3 4 5 6 7 8 -50% NPR 35

Resolution Popularity Evolution Distribution of resolution popularity keeps stationary. 36

Optimal Prefetch Interval ◼ Low interval is good • 🙃 Be conducive to promote hit ratio • 🙂 Indicates frequent prefetching • affect online caching service ◼ No consistently optimal interval on time-varying workload • max hit ratio loss should not exceed 1% • bias between actual interval and real time • 𝑗𝑜𝑢𝑓𝑠𝑤𝑏𝑚 = 10𝑛 turns out to be a appropriate solution which hit ratio loss is 0.95% 37

Conclusion ◼ Large cache capacity results in failure of improvement of advanced cache policies ◼ Social network exhibits “immediacy” ◼ Prefetching method leverages such “immediacy” to improve hit ratio • Latency is cut by an average of 6.9% while sacrificing only 4.14% additional network cost. 39

Thank you & Questions 40

Stores at t Scale: A Tencent Case Stu tudy Ke Zhou, Si Sun , Hua - PowerPoint PPT Presentation

Demystify fying Cache Policies for Photo Stores at t Scale: A Tencent Case Stu tudy Ke Zhou, Si Sun , Hua Wang, Ping Huang, Xubin He, Rui Lan, Wenyan Li, Wenjie Liu, Tianming Yang Huazhong University of Science and Technology Key Laboratory

Exploring Qualcomm Baseband via ModKit Tencent Blade Team Tencent Security Platform Department

Tencent Cloud Gaming Tencent Instant Play and Tencent START Already started the

Neural Text Summarization Piji Li NLP Center, Tencent AI Lab pijili@tencent.com Paper Reading,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Changing Advertising in China Bessie Lee Ching Law Founder & CEO -

C ASE S TUDY II C ASE S TUDY II FAIRHOLME Ignore the crowd. C URRENT I NVESTMENT O PPORTUNITY We

Water Resource Committee Basin Stu tudy Kic ickoff f Meeting 1 11/27/18 Background/History

SURVEY RESULTS SURVEY RESULTS REPORT REPORT Ir Irrigation Sustainability Stu tudy Presented

Stu tudy of f self lf-heating phenomenon of f torrefied wood in in contact wit ith oxygen

Presentation to Western Rosslyn Area Pla lanning Stu tudy Ju June 21, 2014 1 What is

CITY COUNCIL WORK SESSION NOVEMBER 17, 2015 1 Dominio ion Boule levard Corridor Stu tudy

ABRE Stu tudy 12 Month Results of f th the Abre Venous Stent System Di Disclosure Speaker

Surface-enhanced Raman spectroscopy stu tudy of f commercial fr fruit ju juic ices Carlo

STU TUDY Y AI AIM To assess multilevel factors influencing a rural countys capacity to

with FP FPGAs: Cas ase Stu tudy on on a a Key-Value Store FPGAs in the Cloud Wider

Demo mograp graphic hic & Op Oppor portunity tunity Stu tudy dy April 2011 Demograph

PRIVATE DATA CENTERS FOR MULTI-RATE VIDEO STREAMING Pouya Ostovari, Jie Wu, and Abdallah

Studying the QCD Phase Diagram via BES Fluctuations and the Critical Point M. Stephanov U. of

Ezekiel 40-48 Ezekiel 40 Ezekiel 40:2, In the visions of God He took me into the land of Israel

NSP Open Forum NSP Question and Answers August 15 th , 2013 2:00 PM EDT Community Planning and

SEPECC Meeting Tuesday, April 7, 2020 9:00 Welcome Age genda 9:05 OCDEL and ELRC-18 Updates

Facilitating an Inclusive College Experience for Young Adults with IDD and/or Autism Jack

Polyharmonic Maass Forms for PSL (2 , Z ) Je ff Lagarias , University of Michigan Ann Arbor, MI,

Paxos week and the Temple of Doom Doug Woos Logistics notes Next Monday: International

Stores at t Scale: A Tencent Case Stu tudy Ke Zhou, Si Sun , Hua - PowerPoint PPT Presentation

Demystify fying Cache Policies for Photo Stores at t Scale: A Tencent Case Stu tudy Ke Zhou, Si Sun , Hua Wang, Ping Huang, Xubin He, Rui Lan, Wenyan Li, Wenjie Liu, Tianming Yang Huazhong University of Science and Technology Key Laboratory

Exploring Qualcomm Baseband via ModKit Tencent Blade Team Tencent Security Platform Department

Tencent Cloud Gaming Tencent Instant Play and Tencent START Already started the

Neural Text Summarization Piji Li NLP Center, Tencent AI Lab pijili@tencent.com Paper Reading,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Changing Advertising in China Bessie Lee Ching Law Founder &amp; CEO -

C ASE S TUDY II C ASE S TUDY II FAIRHOLME Ignore the crowd. C URRENT I NVESTMENT O PPORTUNITY We

Water Resource Committee Basin Stu tudy Kic ickoff f Meeting 1 11/27/18 Background/History

SURVEY RESULTS SURVEY RESULTS REPORT REPORT Ir Irrigation Sustainability Stu tudy Presented

Stu tudy of f self lf-heating phenomenon of f torrefied wood in in contact wit ith oxygen

Presentation to Western Rosslyn Area Pla lanning Stu tudy Ju June 21, 2014 1 What is

CITY COUNCIL WORK SESSION NOVEMBER 17, 2015 1 Dominio ion Boule levard Corridor Stu tudy

ABRE Stu tudy 12 Month Results of f th the Abre Venous Stent System Di Disclosure Speaker

Surface-enhanced Raman spectroscopy stu tudy of f commercial fr fruit ju juic ices Carlo

STU TUDY Y AI AIM To assess multilevel factors influencing a rural countys capacity to

with FP FPGAs: Cas ase Stu tudy on on a a Key-Value Store FPGAs in the Cloud Wider

Demo mograp graphic hic &amp; Op Oppor portunity tunity Stu tudy dy April 2011 Demograph

PRIVATE DATA CENTERS FOR MULTI-RATE VIDEO STREAMING Pouya Ostovari, Jie Wu, and Abdallah

Studying the QCD Phase Diagram via BES Fluctuations and the Critical Point M. Stephanov U. of

Ezekiel 40-48 Ezekiel 40 Ezekiel 40:2, In the visions of God He took me into the land of Israel

NSP Open Forum NSP Question and Answers August 15 th , 2013 2:00 PM EDT Community Planning and

SEPECC Meeting Tuesday, April 7, 2020 9:00 Welcome Age genda 9:05 OCDEL and ELRC-18 Updates

Facilitating an Inclusive College Experience for Young Adults with IDD and/or Autism Jack

Polyharmonic Maass Forms for PSL (2 , Z ) Je ff Lagarias , University of Michigan Ann Arbor, MI,

Paxos week and the Temple of Doom Doug Woos Logistics notes Next Monday: International

Changing Advertising in China Bessie Lee Ching Law Founder & CEO -

Demo mograp graphic hic & Op Oppor portunity tunity Stu tudy dy April 2011 Demograph