The Brave New World of Using DNA to Store Data
Netscape co-founder-turned-venture capitalist billionaire investor Marc Andreessen once posited that software was eating the world. He was right, and the takeover of software resulted in many things. One of them is data. Lots and lots and lots of data. In the previous two years, humanity created more data than it did during its entire existence combined, and the amount will only increase. Think about it: The hundreds of 50KB emails you write a day, the dozens of 10MB photos, the minute-long, 350MB 4K video you shoot on your iPhone X add up to vast quantities of information. All that information needs to be stored. And that's becoming an issue as data volume outpaces storage space.
The race is on to find another medium capable of storing massive amounts of information in as small a space as possible.
"There won't be enough silicon to store all the data we need. It's unlikely that we can make flash memory smaller. We have reached the physical limits," Victor Zhirnov, chief scientist at the Semiconductor Research Corporation, says. "We are facing a crisis that's comparable to the oil crisis in the 1970s. By 2050, we're going to need to store 10 to the 30 bits, compared to 10 to the 23 bits in 2016." That amount of storage space is equivalent to each of the world's seven billion people owning almost six trillion -- that's 10 to the 12th power -- iPhone Xs with 256GB storage space.
The race is on to find another medium capable of storing massive amounts of information in as small a space as possible. Zhirnov and other scientists are looking at the human body, looking to DNA. "Nature has nailed it," Luis Ceze, a professor in the Department of Computer Science and Engineering at the University of Washington, says. "DNA is a molecular storage medium that is remarkable. It's incredibly dense, many, many thousands of times denser than the densest technology that we have today. And DNA is remarkably general. Any information you can map in bits you can store in DNA." It's so dense -- able to store a theoretical maximum of 215 petabytes (215 million gigabytes) in a single gram -- that all the data ever produced could be stored in the back of a tractor trailer truck.
Writing DNA can be an energy-efficient process, too. Consider how the human body is constantly writing and rewriting DNA, and does so on a couple thousand calories a day. And all it needs for storage is a cool, dark place, a significant energy savings when compared to server farms that require huge amounts of energy to run and even more energy to cool.
Picture it: tiny specks of inert DNA made from silicon or another material, stored in cool, dark, dry areas, preserved for all time.
Researchers first succeeded in encoding data onto DNA in 2012, when Harvard University geneticists George Church and Sri Kosuri wrote a 52,000-word book on A, C, G, and T base pairs. Their method only produced 1.28 petabytes per gram of DNA, however, a volume exceeded the next year when a group encoded all 154 Shakespeare sonnets and a 26-second clip of Martin Luther King's "I Have A Dream" speech. In 2017, Columbia University researchers Yaniv Erlich and Dina Zielinski made the process 60 percent more efficient.
The limiting factor today is cost. Erlich said the work his team did cost $7,000 to encode and decode two megabytes of data. To become useful in a widespread way, the price per megabyte needs to plummet. Even advocates concede this point. "Of course it is expensive," Zhirnov says. "But look how much magnetic storage cost in the 1980s. What you store today in your iPhone for virtually nothing would cost many millions of dollars in 1982." There's reason to think the price will continue to fall. Genome readers are improving, getting cheaper, faster, and smaller, and genome sequencing becomes cheaper every year, too. Picture it: tiny specks of inert DNA made from silicon or another material, stored in cool, dark, dry areas, preserved for all time.
"It just takes a few minutes to double a sample. A few more minutes, you double it again. Very quickly, you have thousands or millions of new copies."
Plus, DNA has another advantage over more traditional forms of storage: It's very easy to reproduce. "If you want a second copy of a hard disk drive, you need components for a disk drive, hook both drives up to a computer, and copy. That's a pain," Nick Goldman, a researcher at the European Bioinformatics Institute, says. "DNA, once you have that first sample, it's a process that is absolutely routine in thousands of laboratories around the world to multiply that using polymerase chain reaction [which uses temperature changes or other processes]. It just takes a few minutes to double a sample. A few more minutes, you double it again. Very quickly, you have thousands or millions of new copies."
This ability to duplicate quickly and easily is a positive trait. But, of course, there's also the potential for danger. Does encoding on DNA, the very basis for life, present ethical issues? Could it get out of control and fundamentally alter life as we know it?
The chance is there, but it's remote. The first reason is that storage could be done with only two base pairs, which would serve as replacements for the 0 and 1 digits that make up all digital data. While doing so would decrease the possible density of the storage, it would virtually eliminate the risk that the sequences would be compatible with life.
But even if scientists and researchers choose to use four base pairs, other safeguards are in place that will prevent trouble. According to Ceze, the computer science professor, the snippets of DNA that they write are very short, around 150 nucleotides. This includes the title, the information that's being encoded, and tags to help organize where the snippet should fall in the larger sequence. Furthermore, they generally avoid repeated letters, which dramatically reduces the chance that a protein could be synthesized from the snippet.
"In the future, we'll know enough about someone from a sample of their DNA that we could make a specific poison. That's the danger, not those of us who want to encode DNA for storage."
Inevitably, some DNA will get spilt. "But it's so unlikely that anything that gets created for storage would have a biological interpretation that could interfere with the mechanisms going on in a living organism that it doesn't worry me in the slightest," Goldman says. "We're not of concern for the people who are worried about the ethical issues of synthetic DNA. They are much more concerned about people deliberately engineering anthrax. In the future, we'll know enough about someone from a sample of their DNA that we could make a specific poison. That's the danger, not those of us who want to encode DNA for storage."
In the end, the reality of and risks surrounding encoding on DNA are the same as any scientific advancement: It's another system that is vulnerable to people with bad intentions but not one that is inherently unethical.
"Every human action has some ethical implications," Zhirnov says. "I can use a hammer to build a house or I can use it to harm another person. I don't see why DNA is in any way more or less ethical."
If that house can store all the knowledge in human history, it's worth learning how to build it.
Editor's Note: In response to readers' comments that silicon is one of the earth's most abundant materials, we reached back out to our source, Dr. Victor Zhirnov. He stands by his statement about a coming shortage of silicon, citing this research. The silicon oxide found in beach sand is unsuitable for semiconductors, he says, because the cost of purifying it would be prohibitive. For use in circuit-making, silicon must be refined to a purity of 99.9999999 percent. So the process begins by mining for pure quartz, which can only be found in relatively few places around the world.
Have You Heard of the Best Sport for Brain Health?
The Friday Five covers five stories in research that you may have missed this week. There are plenty of controversies and troubling ethical issues in science – and we get into many of them in our online magazine – but this news roundup focuses on scientific creativity and progress to give you a therapeutic dose of inspiration headed into the weekend.
Listen on Apple | Listen on Spotify | Listen on Stitcher | Listen on Amazon | Listen on Google
Here are the promising studies covered in this week's Friday Five:
- Reprogram cells to a younger state
- Pick up this sport for brain health
- Do all mental illnesses have the same underlying cause?
- New test could diagnose autism in newborns
- Scientists 3D print an ear and attach it to woman
Can blockchain help solve the Henrietta Lacks problem?
Science has come a long way since Henrietta Lacks, a Black woman from Baltimore, succumbed to cervical cancer at age 31 in 1951 -- only eight months after her diagnosis. Since then, research involving her cancer cells has advanced scientific understanding of the human papilloma virus, polio vaccines, medications for HIV/AIDS and in vitro fertilization.
Today, the World Health Organization reports that those cells are essential in mounting a COVID-19 response. But they were commercialized without the awareness or permission of Lacks or her family, who have filed a lawsuit against a biotech company for profiting from these “HeLa” cells.
While obtaining an individual's informed consent has become standard procedure before the use of tissues in medical research, many patients still don’t know what happens to their samples. Now, a new phone-based app is aiming to change that.
Tissue donors can track what scientists do with their samples while safeguarding privacy, through a pilot program initiated in October by researchers at the Johns Hopkins Berman Institute of Bioethics and the University of Pittsburgh’s Institute for Precision Medicine. The program uses blockchain technology to offer patients this opportunity through the University of Pittsburgh's Breast Disease Research Repository, while assuring that their identities remain anonymous to investigators.
A blockchain is a digital, tamper-proof ledger of transactions duplicated and distributed across a computer system network. Whenever a transaction occurs with a patient’s sample, multiple stakeholders can track it while the owner’s identity remains encrypted. Special certificates called “nonfungible tokens,” or NFTs, represent patients’ unique samples on a trusted and widely used blockchain that reinforces transparency.
Blockchain could be used to notify people if cancer researchers discover that they have certain risk factors.
“Healthcare is very data rich, but control of that data often does not lie with the patient,” said Julius Bogdan, vice president of analytics for North America at the Healthcare Information and Management Systems Society (HIMSS), a Chicago-based global technology nonprofit. “NFTs allow for the encapsulation of a patient’s data in a digital asset controlled by the patient.” He added that this technology enables a more secure and informed method of participating in clinical and research trials.
Without this technology, de-identification of patients’ samples during biomedical research had the unintended consequence of preventing them from discovering what researchers find -- even if that data could benefit their health. A solution was urgently needed, said Marielle Gross, assistant professor of obstetrics, gynecology and reproductive science and bioethics at the University of Pittsburgh School of Medicine.
“A researcher can learn something from your bio samples or medical records that could be life-saving information for you, and they have no way to let you or your doctor know,” said Gross, who is also an affiliate assistant professor at the Berman Institute. “There’s no good reason for that to stay the way that it is.”
For instance, blockchain could be used to notify people if cancer researchers discover that they have certain risk factors. Gross estimated that less than half of breast cancer patients are tested for mutations in BRCA1 and BRCA2 — tumor suppressor genes that are important in combating cancer. With normal function, these genes help prevent breast, ovarian and other cells from proliferating in an uncontrolled manner. If researchers find mutations, it’s relevant for a patient’s and family’s follow-up care — and that’s a prime example of how this newly designed app could play a life-saving role, she said.
Liz Burton was one of the first patients at the University of Pittsburgh to opt for the app -- called de-bi, which is short for decentralized biobank -- before undergoing a mastectomy for early-stage breast cancer in November, after it was diagnosed on a routine mammogram. She often takes part in medical research and looks forward to tracking her tissues.
“Anytime there’s a scientific experiment or study, I’m quick to participate -- to advance my own wellness as well as knowledge in general,” said Burton, 49, a life insurance service representative who lives in Carnegie, Pa. “It’s my way of contributing.”
Liz Burton was one of the first patients at the University of Pittsburgh to opt for the app before undergoing a mastectomy for early-stage breast cancer.
Liz Burton
The pilot program raises the issue of what investigators may owe study participants, especially since certain populations, such as Black and indigenous peoples, historically were not treated in an ethical manner for scientific purposes. “It’s a truly laudable effort,” Tamar Schiff, a postdoctoral fellow in medical ethics at New York University’s Grossman School of Medicine, said of the endeavor. “Research participants are beautifully altruistic.”
Lauren Sankary, a bioethicist and associate director of the neuroethics program at Cleveland Clinic, agrees that the pilot program provides increased transparency for study participants regarding how scientists use their tissues while acknowledging individuals’ contributions to research.
However, she added, “it may require researchers to develop a process for ongoing communication to be responsive to additional input from research participants.”
Peter H. Schwartz, professor of medicine and director of Indiana University’s Center for Bioethics in Indianapolis, said the program is promising, but he wonders what will happen if a patient has concerns about a particular research project involving their tissues.
“I can imagine a situation where a patient objects to their sample being used for some disease they’ve never heard about, or which carries some kind of stigma like a mental illness,” Schwartz said, noting that researchers would have to evaluate how to react. “There’s no simple answer to those questions, but the technology has to be assessed with an eye to the problems it could raise.”
To truly make a difference, blockchain must enable broad consent from patients, not just de-identification.
As a result, researchers may need to factor in how much information to share with patients and how to explain it, Schiff said. There are also concerns that in tracking their samples, patients could tell others what they learned before researchers are ready to publicly release this information. However, Bogdan, the vice president of the HIMSS nonprofit, believes only a minimal study identifier would be stored in an NFT, not patient data, research results or any type of proprietary trial information.
Some patients may be confused by blockchain and reluctant to embrace it. “The complexity of NFTs may prevent the average citizen from capitalizing on their potential or vendors willing to participate in the blockchain network,” Bogdan said. “Blockchain technology is also quite costly in terms of computational power and energy consumption, contributing to greenhouse gas emissions and climate change.”
In addition, this nascent, groundbreaking technology is immature and vulnerable to data security flaws, disputes over intellectual property rights and privacy issues, though it does offer baseline protections to maintain confidentiality. To truly make a difference, blockchain must enable broad consent from patients, not just de-identification, said Robyn Shapiro, a bioethicist and founding attorney at Health Sciences Law Group near Milwaukee.
The Henrietta Lacks story is a prime example, Shapiro noted. During her treatment for cervical cancer at Johns Hopkins, Lacks’s tissue was de-identified (albeit not entirely, because her cell line, HeLa, bore her initials). After her death, those cells were replicated and distributed for important and lucrative research and product development purposes without her knowledge or consent.
Nonetheless, Shapiro thinks that the initiative by the University of Pittsburgh and Johns Hopkins has potential to solve some ethical challenges involved in research use of biospecimens. “Compared to the system that allowed Lacks’s cells to be used without her permission, Shapiro said, “blockchain technology using nonfungible tokens that allow patients to follow their samples may enhance transparency, accountability and respect for persons who contribute their tissue and clinical data for research.”
Read more about laws that have prevented people from the rights to their own cells.