Can We Store the Whole Worlds Data in DNA Storage? HotStorage20 - - PowerPoint PPT Presentation

can we store the whole world s data in dna
SMART_READER_LITE
LIVE PREVIEW

Can We Store the Whole Worlds Data in DNA Storage? HotStorage20 - - PowerPoint PPT Presentation

Can We Store the Whole Worlds Data in DNA Storage? HotStorage20 Bingzhe Li , Nae Young Song, Li Ou, and David H.C. Du University of Minnesota, Twin Cities C enter for R esearch in I ntelligent S torage Outlines Motivation DNA


slide-1
SLIDE 1

Center for Research in Intelligent Storage

Can We Store the Whole World’s Data in DNA Storage?

Bingzhe Li, Nae Young Song, Li Ou, and David H.C. Du University of Minnesota, Twin Cities

HotStorage’20

slide-2
SLIDE 2

2

Center for Research in

Intelligent Storage

  • Motivation
  • DNA background
  • Contributions

– Trade-offs in DNA storage – DNA storage modeling – How many tubes to store the whole world’s data?

  • Indexing scheme
  • Conclusion

Outlines

slide-3
SLIDE 3

3

Center for Research in

Intelligent Storage

Big Data Era

Data is doubled almost every 2 years 44 Zettabytes in 2020 175 Zettabytes in 2025

Image from: https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf

slide-4
SLIDE 4

4

Center for Research in

Intelligent Storage

How to Store these Data?

[1] https://www.seagate.com/enterprise-storage/

44 ZB

50 trillion DVD movies More than 1 billion drives with the size of 16TB[1]

DVD DVD DVD DVD DVD

. . . . . .

5 years or 10 years warranty

slide-5
SLIDE 5

5

Center for Research in

Intelligent Storage

How to Store these Data?

[1] https://www.seagate.com/enterprise-storage/

44 ZB

50 trillion DVD movies More than 1 billion drives with the size of 16TB[1]

DVD DVD DVD DVD DVD

. . . . . .

Looking for an emerging storage device:

  • Keeps data longer
  • Has higher areal density
slide-6
SLIDE 6

6

Center for Research in

Intelligent Storage

DNA Storage

  • High spatial density
  • A theoretical density of 455 EB/g [1]
  • Long persistency
  • Several centuries [2][3]

[1] Raja Appuswamy, Kevin Le Brigand, Pascal Barbry, Marc Antonini, Olivier Madderson, Paul Freemont, James McDonald, and Thomas Heinis. Oligoarchive: Using dna in the dbms storage

  • hierarchy. In CIDR, 2019.

[2] Morten E Allentoft, Matthew Collins, David Harker, James Haile, Charlotte L Oskam, Marie L Hale, Paula F Campos, Jose A Samaniego, M Thomas P Gilbert, Eske Willerslev, et al. The half-life of dna in bone: measuring decay kinetics in 158 dated fossils. Proceedings of the Royal Society B: Biological Sciences, 279(1748):4724–4733, 2012. [3] Robert N Grass, Reinhard Heckel, Michela Puddu, Daniela Paunescu, and Wendelin J Stark. Robust chemical preservation of digital information on dna in silica with error-correcting codes. Angewandte Chemie International Edition, 54(8):2552–2555, 2015.

slide-7
SLIDE 7

7

Center for Research in

Intelligent Storage

Background of DNA storage

Figure 1 from https://www.genome.gov/Pages/Education/Modules/BasicsPresentation.pdf

  • Nucleotides: molecules form the building blocks of DNA.
  • Adenine (A) → Thymine (T)
  • Cytosine (C) → Guanine (G)

Figure 1

slide-8
SLIDE 8

8

Center for Research in

Intelligent Storage

Existing Work

[1] Clelland, C. T., Risca, V. & Bancroft, C. Hiding messages in DNA microdots. Nature 399, 533–534 (1999). [2] Church, G. M., Gao, Y. & Kosuri, S. Next- generation digital information storage in DNA. Science 337, 1628–1628 (2012) [3] Goldman, N. et al. Towards practical, high- capacity,low- maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013). [4] Blawat, M. et al. Forward error correction for DNA data storage. Procedia Comput. Sci. 80, 1011–1022 (2016) [5] Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017). [6] Organick, L. et al. Random access in large- scale DNA data storage. Nat. Biotechnol. 36, 242–248 (2018). [7] Appuswamy, Raja, et al. "OligoArchive: Using DNA in the DBMS storage hierarchy." CIDR. 2019. [8] Organick, Lee, et al. "Probing the physical limits of reliable DNA data retrieval." Nature communications 11.1 (2020): 1-7.

100B 10KB 1MB 100MB 10GB Church et al.[2] Goldman et al.[3]

Year

Size per DNA Storage Tube

Blawat et al.[4] 2012 2013 2016 Erlich et al.[5] 2017 Organick et al.[6] 2018 Clelland et al.[1] 1999 Appuswamy et al.[7] 2019 Organick et al.[8] 2020

~150GB

slide-9
SLIDE 9

9

Center for Research in

Intelligent Storage

Existing Work

[1] Clelland, C. T., Risca, V. & Bancroft, C. Hiding messages in DNA microdots. Nature 399, 533–534 (1999). [2] Church, G. M., Gao, Y. & Kosuri, S. Next- generation digital information storage in DNA. Science 337, 1628–1628 (2012) [3] Goldman, N. et al. Towards practical, high- capacity,low- maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013). [4] Blawat, M. et al. Forward error correction for DNA data storage. Procedia Comput. Sci. 80, 1011–1022 (2016) [5] Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017). [6] Organick, L. et al. Random access in large- scale DNA data storage. Nat. Biotechnol. 36, 242–248 (2018). [7] Appuswamy, Raja, et al. "OligoArchive: Using DNA in the DBMS storage hierarchy." CIDR. 2019. [8] Organick, Lee, et al. "Probing the physical limits of reliable DNA data retrieval." Nature communications 11.1 (2020): 1-7.

100B 10KB 1MB 100MB 10GB Church et al.[2] Goldman et al.[3]

Year

Size per DNA Storage Tube

Blawat et al.[4] 2012 2013 2016 Erlich et al.[5] 2017 Organick et al.[6] 2018 Clelland et al.[1] 1999 Appuswamy et al.[7] 2019 Organick et al.[8] 2020

~150GB

Feasibility, but no scalability!

slide-10
SLIDE 10

10

Center for Research in

Intelligent Storage

  • Investigate the effect of different factors on the capacity of DNA storage (in-

house simulator)

  • Analyze trade-offs between different factors and scalability of DNA storage
  • How to index the whole world’s data in DNA storage

Our Contributions

slide-11
SLIDE 11

11

Center for Research in

Intelligent Storage

Factors and Modeling of DNA Storage

𝑚𝑗𝑜𝑒𝑓𝑦 𝑚𝑞𝑏𝑧𝑚𝑝𝑏𝑒 𝑚𝐹𝐷𝐷 𝑚𝑞𝑠𝑗𝑛𝑓𝑠 𝑚𝑞𝑠𝑗𝑛𝑓𝑠

Primer#1 Index #1 Payload1 ECC1 Primer#2 DNA strand Primer#1 Index #N PayloadN ECCN Primer#2 ...

One pair of primers

L

(𝑚𝑞𝑠𝑗𝑛𝑓𝑠 is about 18 – 25 bp)

L = 𝑚𝑞𝑠𝑗𝑛𝑓𝑠 ∗ 2 + 𝑚𝑗𝑜𝑒𝑓𝑦 + 𝑚𝑞𝑏𝑧𝑚𝑝𝑏𝑒 + 𝑚𝐹𝐷𝐷

  • Primer: is used to read data out (sequencing

process based on Polymerase Chain Reaction (PCR))

  • Index: distinguishes DNA strands in the

same primer pair

  • Payload: useful information
  • ECC: corrects errors from synthesis and

sequencing processes

  • PF (primer factor): N DNA strands

attached to the same primer pair

  • Coding density: useful information (bit)

𝐽𝑜𝑔𝑝 = 𝑑𝑝𝑒𝑗𝑜𝑕 𝑒𝑓𝑜𝑡𝑗𝑢𝑧 ∗ 𝑚𝑞𝑏𝑧𝑚𝑝𝑏𝑒

  • Solubility
  • Droplet volume
  • ...
slide-12
SLIDE 12

12

Center for Research in

Intelligent Storage

DNA Storage Trade-offs: varying DNA length (L)

L 𝑚𝐹𝐷𝐷 𝑚𝑞𝑏𝑧𝑚𝑝𝑏𝑒 𝑚𝑗𝑜𝑒𝑓𝑦 L = 100 – 3000 bp ?

slide-13
SLIDE 13

13

Center for Research in

Intelligent Storage

DNA Storage Trade-offs: varying coding density

Coding density 𝑚𝐹𝐷𝐷 𝑚𝑞𝑏𝑧𝑚𝑝𝑏𝑒 𝑚𝑗𝑜𝑒𝑓𝑦 Coding = 0.29 - 2

slide-14
SLIDE 14

14

Center for Research in

Intelligent Storage

Store the Whole World’s Data based on Today’s Technology

Factor Value Whole world’s data (ZB) 44 DNA Strand Length(bp) 300 Primer length (bp) 20 Coding density 1 ECC 15% Tube size (mL) 1.7 Max DNA solubility in liquid (mg/mL) 500 Droplet size (mL) 0.001 PF 1.55*10E6

660 GB per tube

...

Whole world’s data

more than 1011

slide-15
SLIDE 15

15

Center for Research in

Intelligent Storage

DNA Storage Indexing

Block-based storage device: Object-based storage device:

len * # of entries * # of tube ~77 TB len len len * # of IDs ~7.17*10^5 TB

Indexing for the object-based storage device is more challenging

#1

OFFSET mod capacitytube

Request #2 #i

External index:

  • ffset1

Primer pair #1 ... ...

  • ffseti

Primer pair #i ... ...

  • ffsetM

Primer pair #M

ID1 Tube #1 Primer pair #1 Indexstart1 Indexend1 ... ... ... ... ... IDi Tube #i Primer pair #j Indexstarti Indexendi ... ... ... ... ... IDM Tube #N Primer pair #M IndexstartM IndexendM

#1

Global ID/Key

Request #2 #i #N

External index:

slide-16
SLIDE 16

16

Center for Research in

Intelligent Storage

  • Modeling of DNA storage based on different factors
  • Investigate the trade-offs between different factors
  • Scalability of DNA storage
  • Introduce simple schemes to index the whole world’s data in DNA

storage

Conclusion

slide-17
SLIDE 17

17

Center for Research in

Intelligent Storage

Thanks!

lixx1743@umn.edu