Scientists Are Building an “AccuWeather” for Germs to Predict Your Risk of Getting the Flu
Applied mathematician Sara del Valle works at the U.S.'s foremost nuclear weapons lab: Los Alamos. Once colloquially called Atomic City, it's a hidden place 45 minutes into the mountains northwest of Santa Fe. Here, engineers developed the first atomic bomb.
Like AccuWeather, an app for disease prediction could help people alter their behavior to live better lives.
Today, Los Alamos still a small science town, though no longer a secret, nor in the business of building new bombs. Instead, it's tasked with, among other things, keeping the stockpile of nuclear weapons safe and stable: not exploding when they're not supposed to (yes, please) and exploding if someone presses that red button (please, no).
Del Valle, though, doesn't work on any of that. Los Alamos is also interested in other kinds of booms—like the explosion of a contagious disease that could take down a city. Predicting (and, ideally, preventing) such epidemics is del Valle's passion. She hopes to develop an app that's like AccuWeather for germs: It would tell you your chance of getting the flu, or dengue or Zika, in your city on a given day. And like AccuWeather, it could help people alter their behavior to live better lives, whether that means staying home on a snowy morning or washing their hands on a sickness-heavy commute.
Sara del Valle of Los Alamos is working to predict and prevent epidemics using data and machine learning.
Since the beginning of del Valle's career, she's been driven by one thing: using data and predictions to help people behave practically around pathogens. As a kid, she'd always been good at math, but when she found out she could use it to capture the tentacular spread of disease, and not just manipulate abstractions, she was hooked.
When she made her way to Los Alamos, she started looking at what people were doing during outbreaks. Using social media like Twitter, Google search data, and Wikipedia, the team started to sift for trends. Were people talking about hygiene, like hand-washing? Or about being sick? Were they Googling information about mosquitoes? Searching Wikipedia for symptoms? And how did those things correlate with the spread of disease?
It was a new, faster way to think about how pathogens propagate in the real world. Usually, there's a 10- to 14-day lag in the U.S. between when doctors tap numbers into spreadsheets and when that information becomes public. By then, the world has moved on, and so has the disease—to other villages, other victims.
"We found there was a correlation between actual flu incidents in a community and the number of searches online and the number of tweets online," says del Valle. That was when she first let herself dream about a real-time forecast, not a 10-days-later backcast. Del Valle's group—computer scientists, mathematicians, statisticians, economists, public health professionals, epidemiologists, satellite analysis experts—has continued to work on the problem ever since their first Twitter parsing, in 2011.
They've had their share of outbreaks to track. Looking back at the 2009 swine flu pandemic, they saw people buying face masks and paying attention to the cleanliness of their hands. "People were talking about whether or not they needed to cancel their vacation," she says, and also whether pork products—which have nothing to do with swine flu—were safe to buy.
At the latest meeting with all the prediction groups, del Valle's flu models took first and second place.
They watched internet conversations during the measles outbreak in California. "There's a lot of online discussion about anti-vax sentiment, and people trying to convince people to vaccinate children and vice versa," she says.
Today, they work on predicting the spread of Zika, Chikungunya, and dengue fever, as well as the plain old flu. And according to the CDC, that latter effort is going well.
Since 2015, the CDC has run the Epidemic Prediction Initiative, a competition in which teams like de Valle's submit weekly predictions of how raging the flu will be in particular locations, along with other ailments occasionally. Michael Johannson is co-founder and leader of the program, which began with the Dengue Forecasting Project. Its goal, he says, was to predict when dengue cases would blow up, when previously an area just had a low-level baseline of sick people. "You'll get this massive epidemic where all of a sudden, instead of 3,000 to 4,000 cases, you have 20,000 cases," he says. "They kind of come out of nowhere."
But the "kind of" is key: The outbreaks surely come out of somewhere and, if scientists applied research and data the right way, they could forecast the upswing and perhaps dodge a bomb before it hit big-time. Questions about how big, when, and where are also key to the flu.
A big part of these projects is the CDC giving the right researchers access to the right information, and the structure to both forecast useful public-health outcomes and to compare how well the models are doing. The extra information has been great for the Los Alamos effort. "We don't have to call departments and beg for data," says del Valle.
When data isn't available, "proxies"—things like symptom searches, tweets about empty offices, satellite images showing a green, wet, mosquito-friendly landscape—are helpful: You don't have to rely on anyone's health department.
At the latest meeting with all the prediction groups, del Valle's flu models took first and second place. But del Valle wants more than weekly numbers on a government website; she wants that weather-app-inspired fortune-teller, incorporating the many diseases you could get today, standing right where you are. "That's our dream," she says.
This plot shows the the correlations between the online data stream, from Wikipedia, and various infectious diseases in different countries. The results of del Valle's predictive models are shown in brown, while the actual number of cases or illness rates are shown in blue.
(Courtesy del Valle)
The goal isn't to turn you into a germophobic agoraphobe. It's to make you more aware when you do go out. "If you know it's going to rain today, you're more likely to bring an umbrella," del Valle says. "When you go on vacation, you always look at the weather and make sure you bring the appropriate clothing. If you do the same thing for diseases, you think, 'There's Zika spreading in Sao Paulo, so maybe I should bring even more mosquito repellent and bring more long sleeves and pants.'"
They're not there yet (don't hold your breath, but do stop touching your mouth). She estimates it's at least a decade away, but advances in machine learning could accelerate that hypothetical timeline. "We're doing baby steps," says del Valle, starting with the flu in the U.S., dengue in Brazil, and other efforts in Colombia, Ecuador, and Canada. "Going from there to forecasting all diseases around the globe is a long way," she says.
But even AccuWeather started small: One man began predicting weather for a utility company, then helping ski resorts optimize their snowmaking. His influence snowballed, and now private forecasting apps, including AccuWeather's, populate phones across the planet. The company's progression hasn't been without controversy—privacy incursions, inaccuracy of long-term forecasts, fights with the government—but it has continued, for better and for worse.
Disease apps, perhaps spun out of a small, unlikely team at a nuclear-weapons lab, could grow and breed in a similar way. And both the controversies and public-health benefits that may someday spin out of them lie in the future, impossible to predict with certainty.
Genomics has begun its golden age. Just 20 years ago, sequencing a single genome cost nearly $3 billion and took over a decade. Today, the same feat can be achieved for a few hundred dollars and the better part of a day . Suddenly, the prospect of sequencing not just individuals, but whole populations, has become feasible.
The genetic differences between humans may seem meager, only around 0.1 percent of the genome on average, but this variation can have profound effects on an individual's risk of disease, responsiveness to medication, and even the dosage level that would work best.
Already, initiatives like the U.K.'s 100,000 Genomes Project - now expanding to 1 million genomes - and other similarly massive sequencing projects in Iceland and the U.S., have begun collecting population-scale data in order to capture and study this variation.
The resulting data sets are immensely valuable to researchers and drug developers working to design new 'precision' medicines and diagnostics, and to gain insights that may benefit patients. Yet, because the majority of this data comes from developed countries with well-established scientific and medical infrastructure, the data collected so far is heavily biased towards Western populations with largely European ancestry.
This presents a startling and fast-emerging problem: groups that are under-represented in these datasets are likely to benefit less from the new wave of therapeutics, diagnostics, and insights, simply because they were tailored for the genetic profiles of people with European ancestry.
We may indeed be approaching a golden age of genomics-enabled precision medicine. But if the data bias persists then there is a risk, as with most golden ages throughout history, that the benefits will not be equally accessible to all, and existing inequalities will only be exacerbated.
To remedy the situation, a number of initiatives have sprung up to sequence genomes of under-represented groups, adding them to the datasets and ensuring that they too will benefit from the rapidly unfolding genomic revolution.
Global Gene Corp
The idea behind Global Gene Corp was born eight years ago in Harvard when Sumit Jamuar, co-founder and CEO, met up with his two other co-founders, both experienced geneticists, for a coffee.
"They were discussing the limitless applications of understanding your genetic code," said Jamuar, a business executive from New Delhi.
"And so, being a technology enthusiast type, I was excited and I turned to them and said hey, this is incredible! Could you sequence me and give me some insights? And they actually just turned around and said no, because it's not going to be useful for you - there's not enough reference for what a good Sumit looks like."
What started as a curiosity-driven conversation on the power of genomics ended with a commitment to tackle one of the field's biggest roadblocks - its lack of global representation.
Jamuar set out to begin with India, which has about 20 percent of the world's population, including over 4000 different ethnicities, but contributes less than 2 percent of genomic data, he told Leaps.org.
Eight years later, Global Gene Corp's sequencing initiative is well underway, and is the largest in the history of the Indian subcontinent. The program is being carried out in collaboration with biotech giant Regeneron, with support from the Indian government, local communities, and the Indian healthcare ecosystem. In August 2020, Global Gene Corp's work was recognized through the $1 million 2020 Roddenberry award for organizations that advance the vision of 'Star Trek' creator Gene Roddenberry to better humanity.
This problem has already begun to manifest itself in, for example, much higher levels of genetic misdiagnosis among non-Europeans tested for their risk of certain diseases, such as hypertrophic cardiomyopathy - an inherited disease of the heart muscle.
Global Gene Corp also focuses on developing and implementing AI and machine learning tools to make sense of the deluge of genomic data. These tools are increasingly used by both industry and academia to guide future research by identifying particularly promising or clinically interesting genetic variants. But if the underlying data is skewed European, then the effectiveness of the computational analysis - along with the future advances and avenues of research that emerge from it - will be skewed towards Europeans too.
This problem has already begun to manifest itself in, for example, much higher levels of genetic misdiagnosis among non-Europeans tested for their risk of certain diseases, such as hypertrophic cardiomyopathy - an inherited disease of the heart muscle. Most of the genetic variants used in these tests were identified as being causal for the disease from studies of European genomes. However, many of these variants differ both in their distribution and clinical significance across populations, leading to many patients of non-European ancestry receiving false-positive test results - as their benign genetic variants were misclassified as pathogenic. Had even a small number of genomes from other ethnicities been included in the initial studies, these misdiagnoses could have been avoided.
"Unless we have a data set which is unbiased and representative, we're never going to achieve the success that we want," Jamuar says.
"When Siri was first launched, she could hardly recognize an accent which was not of a certain type, so if I was trying to speak to Siri, I would have to repeat myself multiple times and try to mimic an accent which wasn't my accent so that she could understand it.
"But over time the voice recognition technology improved tremendously because the training data was expanded to include people of very diverse backgrounds and their accents, so the algorithms were trained to be able to pick that up and it dramatically improved the technology. That's the way we have to think about it - without that good-quality diverse data, we will never be able to achieve the full potential of the computational tools."
While mapping India's rich genetic diversity has been the organization's primary focus so far, they plan, in time, to expand their work to other under-represented groups in Asia, the Middle East, Africa, and Latin America.
"As other like-minded people and partners join the mission, it just accelerates the achievement of what we have set out to do, which is to map out and organize the world's genomic diversity so that we can enable high-quality life and longevity benefits for everyone, everywhere," Jamuar says.
Empowering African Genomics
Africa is the birthplace of our species, and today still retains an inordinate amount of total human genetic diversity. Groups that left Africa and went on to populate the rest of the world, some 50 to 100,000 years ago, were likely small in number and only took a fraction of the total genetic diversity with them. This ancient bottleneck means that no other group in the world can match the level of genetic diversity seen in modern African populations.
Despite Africa's central importance in understanding the history and extent of human genetic diversity, the genomics of African populations remains wildly understudied. Addressing this disparity has become a central focus of the H3Africa Consortium, an initiative formally launched in 2012 with support from the African Academy of Sciences, the U.S. National Institutes of Health, and the UK's Wellcome Trust. Today, H3Africa supports over 50 projects across the continent, on an array of different research areas in genetics relevant to the health and heredity of Africans.
"Africa is the cradle of Humankind. So what that really means is that the populations that are currently living in Africa are among some of the oldest populations on the globe, and we know that the longer populations have had to go through evolutionary phases, the more variation there is in the genomes of people who live presently," says Zane Lombard, a principal investigator at H3Africa and Associate Professor of Human Genetics at the University of the Witwatersrand in Johannesburg, South Africa.
"So for that reason, African populations carry a huge amount of genetic variation and diversity, which is pretty much uncaptured. There's still a lot to learn as far as novel variation is concerned by looking at and studying African genomes."
A recent landmark H3Africa study, led by Lombard and published in Nature in October, sequenced the genomes of over 400 African individuals from 50 ethno-linguistic groups - many of which had never been sampled before.
Despite the relatively modest number of individuals sequenced in the study, over three million previously undescribed genetic variants were found, and complex patterns of ancestral migration were uncovered.
"In some of these ethno-linguistic groups they don't have a word for DNA, so we've had to really think about how to make sure that we communicate the purposes of different studies to participants so that you have true informed consent," says Lombard.
"The objective," she explained, "was to try and fill some of the gaps for many of these populations for which we didn't have any whole genome sequences or any genetic variation data...because if we're thinking about the future of precision medicine, if the patient is a member of a specific group where we don't know a lot about the genomic variation that exists in that group, it makes it really difficult to start thinking about clinical interpretation of their data."
From H3Africa's conception, the consortium's goal has not only been to better represent Africa's staggering genetic diversity in genomic data sets, but also to build Africa's domestic genomics capabilities and empower a new generation of African researchers. By doing so, the hope is that Africans will be able to set their own genomics agenda, and leapfrog to new and better ways of doing the work.
"The training that has happened on the continent and the number of new scientists, new students, and fellows that have come through the process and are now enabled to start their own research groups, to grow their own research in their countries, to be a spokesperson for genomics research in their countries, and to build that political will to do these larger types of sequencing initiatives - that is really a significant outcome from H3Africa as well. Over and above all the science that's coming out," Lombard says.
"What has been created through H3Africa is just this locus of researchers and scientists and bioethicists who have the same goal at heart - to work towards adjusting the data bias and making sure that all global populations are represented in genomics."
Jurassic Park Without the Scary Parts: How Stem Cells May Rescue the Near-Extinct Rhinoceros
I am a stem cell scientist. In my day job I work on developing ways to use stem cells to treat neurological disease – human disease. This is the story about how I became part of a group dedicated to rescuing the northern white rhinoceros from extinction.
The earth is now in an era that is called the "sixth mass extinction." The first extinction, 400 million years ago, put an end to 86 percent of the existing species, including most of the trilobites. When the earth grew hotter, dustier, or darker, it lost fish, amphibians, reptiles, plants, dinosaurs, mammals and birds. Each extinction event wiped out 80 to 90 percent of the life on the planet at the time. The first 5 mass extinctions were caused by natural disasters: volcanoes, fires, a meteor. But humans can take credit for the 6th.
Because of human activities that destroy habitats, creatures are now becoming extinct at a rate that is higher than any previously experienced. Some animals, like the giant panda and the California condor, have been pulled back from the brink of extinction by conserving their habitats, breeding in captivity, and educating the public about their plight.
But not the northern white rhino. This gentle giant is a vegetarian that can weigh up to 5,000 pounds. The rhino's weakness is its horn, which has become a valuable commodity because of the mistaken idea that it grants power and has medicinal value. Horns are not medicine; the horns are made of keratin, the same protein that is in fingernails. But as recently as 2017 more than 1,000 rhinos were slaughtered each year to harvest their horns.
All 6 rhino species are endangered. But the northern white has been devastated. Only two members of this species are alive now: Najin, age 32, and her daughter Fatu, 21, live in a protected park in Kenya. They are social animals and would prefer the company of other rhinos of their kind; but they can't know that they are the last two survivors of their entire species. No males exist anymore. The last male, Sudan, died in 2018 at age 45.
We are celebrating a huge milestone in the efforts to use stem cells to rescue the rhino.
I became involved in the rhino rescue project on a sunny day in February, 2008 at the San Diego Wild Animal Park in Escondido, about 30 miles north of my lab in La Jolla. My lab had relocated a couple of months earlier to Scripps Research Institute to start the Center for Regenerative Medicine for human stem cell research. To thank my staff for their hard work, I wanted to arrange a special treat. I contacted my friend Oliver Ryder, who is director of the Institute for Conservation Research at the zoo, to see if I could take them on a safari, a tour in a truck through the savanna habitat at the park.
This was the first of the "stem cell safaris" that the lab would enjoy over the next few years. On the safari we saw elands and cape buffalo, and fed giraffes and rhinos. And we talked about stem cells; in particular, we discussed a surprising technological breakthrough recently reported by the Japanese scientist Shinya Yamanaka that enabled conversion of ordinary skin cells into pluripotent stem cells.
Pluripotent stem cells can develop into virtually any cell type in the body. They exist when we are very young embryos; five days after we were just fertilized eggs, we became blastocysts, invisible tiny balls of a few hundred cells packed with the power to develop into an entire human being. Long before we are born, these cells of vast potential transform into highly specialized cells that generate our brains, our hearts, and everything else.
Human pluripotent stem cells from blastocysts can be cultured in the lab, and are called embryonic stem cells. But thanks to Dr. Yamanaka, anyone can have their skin cells reprogrammed into pluripotent stem cells, just like the ones we had when we were embryos. Dr. Yamanaka won the Nobel Prize for these cells, called "induced pluripotent stem cells" (iPSCs) several years later.
On our safari we realized that if we could make these reprogrammed stem cells from human skin cells, why couldn't we make them from animals' cells? How about endangered animals? Could such stem cells be made from animals whose skin cells had been being preserved since the 1970s in the San Diego Zoo's Frozen Zoo®? Our safari leader, Oliver Ryder, was the curator of the Frozen Zoo and knew what animal cells were stored in its giant liquid nitrogen tanks at −196°C (-320° F). The Frozen Zoo was established by Dr. Kurt Benirschke in 1975 in the hope that someday the collection would aid in rescue of animals that were on the brink of extinction. The frozen collection reached 10,000 cell lines this year.
We returned to the lab after the safari, and I asked my scientists if any of them would like to take on the challenge of making reprogrammed stem cells from endangered species. My new postdoctoral fellow, Inbar Friedrich Ben-Nun, raised her hand. Inbar had arrived only a few weeks earlier from Israel, and she was excited about doing something that had never been done before. Oliver picked the animals we would use. He chose his favorite animal, the critically endangered northern white rhinoceros, and the drill, which is an endangered primate related to the mandrill monkey,
When Inbar started work on reprogramming cells from the Frozen Zoo, there were 8 living northern rhinoceros around the world: Nola, Angalifu, Nesari, Nabire, Suni, Sudan, Najin, and Fatu. We chose to reprogram Fatu, the youngest of the remaining animals.
Through sheer determination and trial and error, Inbar got the reprogramming technique to work, and in 2011 we published the first report of iPSCs from endangered species in the scientific journal Nature Methods. The cover of the journal featured a drawing of an ark packed with animals that might someday be rescued through iPSC technology. By 2011, one of the 8 rhinos, Nesari, had died.
This kernel of hope for using iPSCs to rescue rhinos grew over the next 10 years. The zoo built the Rhino Rescue Center, and brought in 6 females of the closely related species, the southern white rhinoceros, from Africa. Southern white rhino populations are on the rise, and it appears that this species will survive, at least in captivity. The females are destined to be surrogate mothers for embryos made from northern white rhino cells, when eventually we hope to generate sperm and eggs from the reprogrammed stem cells, and fertilize the eggs in vitro, much the same as human IVF.
The author, Jeanne Loring, at the Rhino Rescue Center with one of the southern white rhino surrogates.
David Barker
As this project has progressed, we've been saddened by the loss of all but the last two remaining members of the species. Nola, the last northern white rhino in the U.S., who was at the San Diego Zoo, died in 2015.
But we are celebrating a huge milestone in the efforts to use stem cells to rescue the rhino. Just over a month ago, we reported that by reprogramming cells preserved in the Frozen Zoo, we produced iPSCs from stored cells of 9 northern white rhinos: Fatu, Najin, Nola, Suni, Nadi, Dinka, Nasima, Saut, and Angalifu. We also reprogrammed cells from two of the southern white females, Amani and Wallis.
We don't know when it will be possible to make a northern white rhino embryo; we have to figure out how to use methods already developed for laboratory mice to generate sperm and eggs from these cells. The male rhino Angalifu died in 2014, but ever since I saw beating heart cells derived from his very own cells in a culture dish, I've felt hope that he will one day have children who will seed a thriving new herd of northern white rhinos.