DNA — nicknamed “nature’s storage medium” — has accurately stored the instruction sets for all life on Earth for billions of years. But it also may hold the keys to managing explosive data growth and storing archival data for generations to come.
The idea of storing digital data in DNA dates back more than a half century, but making it a reality has accelerated in recent years with advances in biotechnology and declining costs of genome sequencing.
Dave Landsman is the senior director of industry standards and a distinguished engineer at Western Digital. For the past two years, he’s been one of the principals in the company’s exploration of DNA data storage.
“As we search for ways to store the explosion of data being created by the desire to digitize, save, and mine ever more information, we need to explore novel storage technology to complement today’s storage hierarchy,” said Landsman. “While the technology is still nascent as a commercial reality, DNA data storage could unlock the ability to store vast amounts of data for decades, or centuries, at a low cost.”
In late 2020, technology leaders Twist Bioscience, Illumina, Microsoft, and Western Digital joined forces to create the DNA Data Storage Alliance. The organization grew to more than 60 member companies and recently joined the Storage Networking Industry Association (SNIA) as a SNIA Technology Affiliate group. This leverages the membership of the Alliance and the standards setting experience of SNIA.
“To get a DNA data storage rack on a data center floor ultimately requires a standards-based ecosystem,” said Landsman. “The DNA Data Storage Alliance’s affiliation with SNIA will jump start our efforts to create this.”
The “AGCT’s” of DNA storage
Landsman explained how DNA data storage technology relies on the building blocks of chemistry and biology.
DNA molecules are comprised of long chains of nucleotides. Each DNA nucleotide contains a base (adenine, guanine, cytosine, or thymine) that encodes the information our cellular machinery uses to code and express the human genome.
Due to dramatic advances over the past few decades, it is now possible to construct a synthetic DNA molecule, base-by-base, with the bases strung together in any order.
By mapping the 1’s and 0’s of a digital object (e.g., file, image, etc.) onto the four DNA bases, synthesized DNA becomes an encoded version of the original digital data. When it’s time to read the data, DNA is sequenced, extracting the bases and decoding them back into 1’s and 0’s.
“The science of manipulating DNA at a molecular level, which has been used in medical and other scientific applications for over 30 years, has progressed to where storing digital data in DNA is no longer science fiction, but an emerging reality,” said Landsman.
The technology has proven it could work. Two years ago, Twist Bioscience stored the first episode of a NetFlix series, BIOHACKERS, into DNA.
According to Landsman, the challenge now is to scale the science. “This is what the tech industry excels at, and this is where the magic will be over the next 5-10 years,” he said.
Why DNA data storage?
DNA as a storage medium has many characteristics which make it an attractive option for zettabyte-scale archival storage.
DNA density is orders of magnitude higher than any of today’s media. The most common medium used for archiving data is LTO tape cassettes. The latest generation, LTO-9, can store 18 TB per cartridge. Filling that same cartridge with DNA bits would hold nearly 2 exabytes of data. That’s more than 100,000 times the capacity of a single LTO-9 tape.
Landsman explained that in addition to being very small, DNA bits last a long time, are inexpensive to store, and do not need refreshing. As long as they’re kept free from water and oxidation, DNA molecules can last for thousands of years.
DNA’s durability also means that DNA-based archives do not need to be re-written to new media over time. Furthermore, DNA has an immutable format. Reading it in the future will not be limited by having the original reader devices available.
Due to all these factors, DNA as a storage medium comes with substantial TCO and environmental benefits. Instead of building more and more sprawling 100-megawatt data centers, data could be stored using almost no energy in a tiny, compact capsule. This will create new ways to maintain and capture value from the ocean of information we are digitizing today.
“If backup is affordable, we, in the global sense, would back up everything,” said Steffen Hellmold, the SVP of business development for data storage at Twist Bioscience, a company specialized in building synthetic DNA tools.
Hellmold believes DNA offers the lowest long-term TCO, and his team is currently working on a GB-scale chip for data to DNA synthesis.
Storage in the DNA
Recent analyst research estimates that “at least 30% of digital businesses will mandate DNA storage trials by 2024.”
“Initial markets for DNA data storage include digital preservation, media and entertainment, big science, big data, and healthcare, all of which have long-term archiving requirements,” according to Hellmold.
But unlike other science fiction-like storage media concepts, such as glass or holograms, making DNA storage a reality also comes with another unique challenge: educating the market about concepts of biological computing.
“Someone asked me, only partly joking, ‘Does DNA data storage mean I will be uploading my music collection into my dog?’” said Landsman. “I explained that the creation of the DNA data storage medium in no way requires, uses, nor creates any cells, organisms, or life. We are simply using chemical mechanisms to store digital bits in DNA molecules, instead of using electromagnetic or optical mechanisms to store the bits in silicon, magnetic, or other materials.”