Scientists Are Building an “AccuWeather” for Germs to Predict Your Risk of Getting the Flu
Applied mathematician Sara del Valle works at the U.S.'s foremost nuclear weapons lab: Los Alamos. Once colloquially called Atomic City, it's a hidden place 45 minutes into the mountains northwest of Santa Fe. Here, engineers developed the first atomic bomb.
Like AccuWeather, an app for disease prediction could help people alter their behavior to live better lives.
Today, Los Alamos still a small science town, though no longer a secret, nor in the business of building new bombs. Instead, it's tasked with, among other things, keeping the stockpile of nuclear weapons safe and stable: not exploding when they're not supposed to (yes, please) and exploding if someone presses that red button (please, no).
Del Valle, though, doesn't work on any of that. Los Alamos is also interested in other kinds of booms—like the explosion of a contagious disease that could take down a city. Predicting (and, ideally, preventing) such epidemics is del Valle's passion. She hopes to develop an app that's like AccuWeather for germs: It would tell you your chance of getting the flu, or dengue or Zika, in your city on a given day. And like AccuWeather, it could help people alter their behavior to live better lives, whether that means staying home on a snowy morning or washing their hands on a sickness-heavy commute.
Sara del Valle of Los Alamos is working to predict and prevent epidemics using data and machine learning.
Since the beginning of del Valle's career, she's been driven by one thing: using data and predictions to help people behave practically around pathogens. As a kid, she'd always been good at math, but when she found out she could use it to capture the tentacular spread of disease, and not just manipulate abstractions, she was hooked.
When she made her way to Los Alamos, she started looking at what people were doing during outbreaks. Using social media like Twitter, Google search data, and Wikipedia, the team started to sift for trends. Were people talking about hygiene, like hand-washing? Or about being sick? Were they Googling information about mosquitoes? Searching Wikipedia for symptoms? And how did those things correlate with the spread of disease?
It was a new, faster way to think about how pathogens propagate in the real world. Usually, there's a 10- to 14-day lag in the U.S. between when doctors tap numbers into spreadsheets and when that information becomes public. By then, the world has moved on, and so has the disease—to other villages, other victims.
"We found there was a correlation between actual flu incidents in a community and the number of searches online and the number of tweets online," says del Valle. That was when she first let herself dream about a real-time forecast, not a 10-days-later backcast. Del Valle's group—computer scientists, mathematicians, statisticians, economists, public health professionals, epidemiologists, satellite analysis experts—has continued to work on the problem ever since their first Twitter parsing, in 2011.
They've had their share of outbreaks to track. Looking back at the 2009 swine flu pandemic, they saw people buying face masks and paying attention to the cleanliness of their hands. "People were talking about whether or not they needed to cancel their vacation," she says, and also whether pork products—which have nothing to do with swine flu—were safe to buy.
At the latest meeting with all the prediction groups, del Valle's flu models took first and second place.
They watched internet conversations during the measles outbreak in California. "There's a lot of online discussion about anti-vax sentiment, and people trying to convince people to vaccinate children and vice versa," she says.
Today, they work on predicting the spread of Zika, Chikungunya, and dengue fever, as well as the plain old flu. And according to the CDC, that latter effort is going well.
Since 2015, the CDC has run the Epidemic Prediction Initiative, a competition in which teams like de Valle's submit weekly predictions of how raging the flu will be in particular locations, along with other ailments occasionally. Michael Johannson is co-founder and leader of the program, which began with the Dengue Forecasting Project. Its goal, he says, was to predict when dengue cases would blow up, when previously an area just had a low-level baseline of sick people. "You'll get this massive epidemic where all of a sudden, instead of 3,000 to 4,000 cases, you have 20,000 cases," he says. "They kind of come out of nowhere."
But the "kind of" is key: The outbreaks surely come out of somewhere and, if scientists applied research and data the right way, they could forecast the upswing and perhaps dodge a bomb before it hit big-time. Questions about how big, when, and where are also key to the flu.
A big part of these projects is the CDC giving the right researchers access to the right information, and the structure to both forecast useful public-health outcomes and to compare how well the models are doing. The extra information has been great for the Los Alamos effort. "We don't have to call departments and beg for data," says del Valle.
When data isn't available, "proxies"—things like symptom searches, tweets about empty offices, satellite images showing a green, wet, mosquito-friendly landscape—are helpful: You don't have to rely on anyone's health department.
At the latest meeting with all the prediction groups, del Valle's flu models took first and second place. But del Valle wants more than weekly numbers on a government website; she wants that weather-app-inspired fortune-teller, incorporating the many diseases you could get today, standing right where you are. "That's our dream," she says.
This plot shows the the correlations between the online data stream, from Wikipedia, and various infectious diseases in different countries. The results of del Valle's predictive models are shown in brown, while the actual number of cases or illness rates are shown in blue.
(Courtesy del Valle)
The goal isn't to turn you into a germophobic agoraphobe. It's to make you more aware when you do go out. "If you know it's going to rain today, you're more likely to bring an umbrella," del Valle says. "When you go on vacation, you always look at the weather and make sure you bring the appropriate clothing. If you do the same thing for diseases, you think, 'There's Zika spreading in Sao Paulo, so maybe I should bring even more mosquito repellent and bring more long sleeves and pants.'"
They're not there yet (don't hold your breath, but do stop touching your mouth). She estimates it's at least a decade away, but advances in machine learning could accelerate that hypothetical timeline. "We're doing baby steps," says del Valle, starting with the flu in the U.S., dengue in Brazil, and other efforts in Colombia, Ecuador, and Canada. "Going from there to forecasting all diseases around the globe is a long way," she says.
But even AccuWeather started small: One man began predicting weather for a utility company, then helping ski resorts optimize their snowmaking. His influence snowballed, and now private forecasting apps, including AccuWeather's, populate phones across the planet. The company's progression hasn't been without controversy—privacy incursions, inaccuracy of long-term forecasts, fights with the government—but it has continued, for better and for worse.
Disease apps, perhaps spun out of a small, unlikely team at a nuclear-weapons lab, could grow and breed in a similar way. And both the controversies and public-health benefits that may someday spin out of them lie in the future, impossible to predict with certainty.
Can blockchain help solve the Henrietta Lacks problem?
Science has come a long way since Henrietta Lacks, a Black woman from Baltimore, succumbed to cervical cancer at age 31 in 1951 -- only eight months after her diagnosis. Since then, research involving her cancer cells has advanced scientific understanding of the human papilloma virus, polio vaccines, medications for HIV/AIDS and in vitro fertilization.
Today, the World Health Organization reports that those cells are essential in mounting a COVID-19 response. But they were commercialized without the awareness or permission of Lacks or her family, who have filed a lawsuit against a biotech company for profiting from these “HeLa” cells.
While obtaining an individual's informed consent has become standard procedure before the use of tissues in medical research, many patients still don’t know what happens to their samples. Now, a new phone-based app is aiming to change that.
Tissue donors can track what scientists do with their samples while safeguarding privacy, through a pilot program initiated in October by researchers at the Johns Hopkins Berman Institute of Bioethics and the University of Pittsburgh’s Institute for Precision Medicine. The program uses blockchain technology to offer patients this opportunity through the University of Pittsburgh's Breast Disease Research Repository, while assuring that their identities remain anonymous to investigators.
A blockchain is a digital, tamper-proof ledger of transactions duplicated and distributed across a computer system network. Whenever a transaction occurs with a patient’s sample, multiple stakeholders can track it while the owner’s identity remains encrypted. Special certificates called “nonfungible tokens,” or NFTs, represent patients’ unique samples on a trusted and widely used blockchain that reinforces transparency.
Blockchain could be used to notify people if cancer researchers discover that they have certain risk factors.
“Healthcare is very data rich, but control of that data often does not lie with the patient,” said Julius Bogdan, vice president of analytics for North America at the Healthcare Information and Management Systems Society (HIMSS), a Chicago-based global technology nonprofit. “NFTs allow for the encapsulation of a patient’s data in a digital asset controlled by the patient.” He added that this technology enables a more secure and informed method of participating in clinical and research trials.
Without this technology, de-identification of patients’ samples during biomedical research had the unintended consequence of preventing them from discovering what researchers find -- even if that data could benefit their health. A solution was urgently needed, said Marielle Gross, assistant professor of obstetrics, gynecology and reproductive science and bioethics at the University of Pittsburgh School of Medicine.
“A researcher can learn something from your bio samples or medical records that could be life-saving information for you, and they have no way to let you or your doctor know,” said Gross, who is also an affiliate assistant professor at the Berman Institute. “There’s no good reason for that to stay the way that it is.”
For instance, blockchain could be used to notify people if cancer researchers discover that they have certain risk factors. Gross estimated that less than half of breast cancer patients are tested for mutations in BRCA1 and BRCA2 — tumor suppressor genes that are important in combating cancer. With normal function, these genes help prevent breast, ovarian and other cells from proliferating in an uncontrolled manner. If researchers find mutations, it’s relevant for a patient’s and family’s follow-up care — and that’s a prime example of how this newly designed app could play a life-saving role, she said.
Liz Burton was one of the first patients at the University of Pittsburgh to opt for the app -- called de-bi, which is short for decentralized biobank -- before undergoing a mastectomy for early-stage breast cancer in November, after it was diagnosed on a routine mammogram. She often takes part in medical research and looks forward to tracking her tissues.
“Anytime there’s a scientific experiment or study, I’m quick to participate -- to advance my own wellness as well as knowledge in general,” said Burton, 49, a life insurance service representative who lives in Carnegie, Pa. “It’s my way of contributing.”
Liz Burton was one of the first patients at the University of Pittsburgh to opt for the app before undergoing a mastectomy for early-stage breast cancer.
Liz Burton
The pilot program raises the issue of what investigators may owe study participants, especially since certain populations, such as Black and indigenous peoples, historically were not treated in an ethical manner for scientific purposes. “It’s a truly laudable effort,” Tamar Schiff, a postdoctoral fellow in medical ethics at New York University’s Grossman School of Medicine, said of the endeavor. “Research participants are beautifully altruistic.”
Lauren Sankary, a bioethicist and associate director of the neuroethics program at Cleveland Clinic, agrees that the pilot program provides increased transparency for study participants regarding how scientists use their tissues while acknowledging individuals’ contributions to research.
However, she added, “it may require researchers to develop a process for ongoing communication to be responsive to additional input from research participants.”
Peter H. Schwartz, professor of medicine and director of Indiana University’s Center for Bioethics in Indianapolis, said the program is promising, but he wonders what will happen if a patient has concerns about a particular research project involving their tissues.
“I can imagine a situation where a patient objects to their sample being used for some disease they’ve never heard about, or which carries some kind of stigma like a mental illness,” Schwartz said, noting that researchers would have to evaluate how to react. “There’s no simple answer to those questions, but the technology has to be assessed with an eye to the problems it could raise.”
To truly make a difference, blockchain must enable broad consent from patients, not just de-identification.
As a result, researchers may need to factor in how much information to share with patients and how to explain it, Schiff said. There are also concerns that in tracking their samples, patients could tell others what they learned before researchers are ready to publicly release this information. However, Bogdan, the vice president of the HIMSS nonprofit, believes only a minimal study identifier would be stored in an NFT, not patient data, research results or any type of proprietary trial information.
Some patients may be confused by blockchain and reluctant to embrace it. “The complexity of NFTs may prevent the average citizen from capitalizing on their potential or vendors willing to participate in the blockchain network,” Bogdan said. “Blockchain technology is also quite costly in terms of computational power and energy consumption, contributing to greenhouse gas emissions and climate change.”
In addition, this nascent, groundbreaking technology is immature and vulnerable to data security flaws, disputes over intellectual property rights and privacy issues, though it does offer baseline protections to maintain confidentiality. To truly make a difference, blockchain must enable broad consent from patients, not just de-identification, said Robyn Shapiro, a bioethicist and founding attorney at Health Sciences Law Group near Milwaukee.
The Henrietta Lacks story is a prime example, Shapiro noted. During her treatment for cervical cancer at Johns Hopkins, Lacks’s tissue was de-identified (albeit not entirely, because her cell line, HeLa, bore her initials). After her death, those cells were replicated and distributed for important and lucrative research and product development purposes without her knowledge or consent.
Nonetheless, Shapiro thinks that the initiative by the University of Pittsburgh and Johns Hopkins has potential to solve some ethical challenges involved in research use of biospecimens. “Compared to the system that allowed Lacks’s cells to be used without her permission, Shapiro said, “blockchain technology using nonfungible tokens that allow patients to follow their samples may enhance transparency, accountability and respect for persons who contribute their tissue and clinical data for research.”
Read more about laws that have prevented people from the rights to their own cells.
New tech for prison reform spreads to 11 states
A new non-profit called Recidiviz is using data technology to reduce the size of the U.S. criminal justice system. The bi-coastal company (SF and NYC) is currently working with 11 states to improve their systems and, so far, has helped remove nearly 69,000 people — ones left floundering in jail or on parole when they should have been released.
“The root cause is fragmentation,” says Clementine Jacoby, 31, a software engineer who worked at Google before co-founding Recidiviz in 2019. In the 1970s and 80s, the U.S. built a series of disconnected data systems, and this patchwork is still being used by criminal justice authorities today. It requires parole officers to manually calculate release dates, leading to errors in many cases. “[They] have done everything they need to do to earn their release, but they're still stuck in the system,” Jacoby says.
Recidiviz has built a platform that connects the different databases, with the goal of identifying people who are already qualified for release but remain behind bars or on supervision. “Think of Recidiviz like Google Maps,” says Jacoby, who worked on Maps when she was at the tech giant. Google Maps takes in data from different sources – satellite images, street maps, local business data — and organizes it into one easy view. “Recidiviz does something similar with criminal justice data,” Jacoby explains, “making it easy to identify people eligible to come home or to move to less intensive levels of supervision.”
People like Jacoby’s uncle. His experience with incarceration is what inspired her passion for criminal justice reform in the first place.
The problems are vast
The U.S. has the highest incarceration rate in the world — 2 million people according to the watchdog group, Prison Policy Initiative — at a cost of $182 billion a year. The numbers could be a lot lower if not for an array of problems including inaccurate sentencing calculations, flawed algorithms and parole violations laws.
Sentencing miscalculations
To determine eligibility for release, the current system requires corrections officers to check 21 different requirements spread across five different databases for each of the 90 to 100 people under their supervision. These manual calculations are time prohibitive, says Jacoby, and fall victim to human error.
In addition, Recidiviz found that policies aimed at helping to reduce the prison population don’t always work correctly. A key example is time off for good behavior laws that allow inmates to earn one day off for every 30 days of good behavior. Some states' data systems are built to calculate time off as one day per month of good behavior, rather than per day. Over the course of a decade-long sentence, Jacoby says these miscalculations can lead to a huge discrepancy in the calculated release data and the actual release date.
Algorithms
Commercial algorithm-based software systems for risk assessment continue to be widely used in the criminal justice system, even though a 2018 study published in Science Advances exposed their limitations. After the study went viral, it took three years for the Justice Department to issue a report on their own flawed algorithms used to reduce the federal prison population as part of the 2018 First Step Act. The program, it was determined, overestimated the risk of putting inmates of color into early-release programs.
Despite its name, Recidiviz does not build these types of algorithms for predicting recidivism, or whether someone will commit another crime after being released from prison. Rather, Jacoby says the company’s "descriptive analytics” approach is specifically intended to weed out incarceration inequalities and avoid algorithmic pitfalls.
Parole violation laws
Research shows that 350,000 people a year — about a quarter of the total prison population — are sent back not because they’ve committed another crime, but because they’ve broken a specific rule of their probation. “Things that wouldn't send you or I to prison, but would send someone on parole,” such as crossing county lines or being in the presence of alcohol when they shouldn’t be, are inflating the prison population, says Jacoby.
It’s personal for the co-founder and CEO
“I grew up with an uncle who went into the prison system,” Jacoby says. At 19, he was sentenced to ten years in prison for a non-violent crime. A few months after being released from jail, he was sent back for a non-violent parole violation.
“For my family, the fact that one in four prison admissions are driven not by a crime but by someone who's broken a rule on probation and parole was really profound because that happened to my uncle,” Jacoby says. The experience led her to begin studying criminal justice in high school, then college. She continued her dive into how the criminal justice system works as part of her Passion Project while at Google, a program that allows employees to spend 20 percent of their time on pro-bono work. Two colleagues whose family members had also been stuck in the system joined her.
As part of the project, Jacoby interviewed hundreds of people involved in the criminal justice system. “Those on the right, those on the left, agreed that bad data was slowing down reform,” she says. Their research brought them to North Dakota where they began to understand the root of the problem. The corrections department is making “huge, consequential decisions every day [without] … the data,” Jacoby says. In a new video by Recidiviz not yet released, Jacoby recounts her exchange with the state’s director of corrections who told her, “‘It’s not that we have the data and we just don’t know how to make it public; we don’t have the information you think we have.'"
A mock-up (with fake data) of the types of dashboards and insights that Recidiviz provides to state governments.
Recidiviz
As a software engineer, Jacoby says the comment made no sense to her — until she witnessed it first-hand. “We spent a lot of time driving around in cars with corrections directors and parole officers watching them use these incredibly taxing, frankly terrible, old data systems,” Jacoby says.
As they weeded through thousands of files — some computerized, some on paper — they unearthed the consequences of bad data: Hundreds of people in prison well past their release date and thousands more whose release from parole was delayed because of minor paperwork issues. They found individuals stuck in parole because they hadn’t checked one last item off their eligibility list — like simply failing to provide their parole officer with a paystub. And, even when parolees advocated for themselves, the archaic system made it difficult for their parole officers to confirm their eligibility, so they remained in the system. Jacoby and her team also unpacked specific policies that drive racial disparities — such as fines and fees.
The Solution
It’s more than a trivial technical challenge to bring the incomplete, fragmented data onto a 21st century data platform. It takes months for Recidiviz to sift through a state’s information systems to connect databases “with the goal of tracking a person all the way through their journey and find out what’s working for 18- to 25-year-old men, what’s working for new mothers,” explains Jacoby in the video.
TED Talk: How bad data traps people in the U.S. justice system
TED Fellow Clementine Jacoby's TED Talk went live on Jan. 13. It describes how we can fix bad data in the criminal justice system, "bringing thousands of people home, reducing costs and improving public safety along the way."
Clementine Jacoby • TED2022
Ojmarrh Mitchell, an associate professor in the School of Criminology and Criminal Justice at Arizona State University, who is not involved with the company, says what Recidiviz is doing is “remarkable.” His perspective goes beyond academic analysis. In his pre-academic years, Mitchell was a probation officer, working within the framework of the “well known, but invisible” information sharing issues that plague criminal justice departments. The flexibility of Recidiviz’s approach is what makes it especially innovative, he says. “They identify the specific gaps in each jurisdiction and tailor a solution for that jurisdiction.”
On the downside, the process used by Recidiviz is “a bit opaque,” Mitchell says, with few details available on how Recidiviz designs its tools and tracks outcomes. By sharing more information about how its actions lead to progress in a given jurisdiction, Recidiviz could help reformers in other places figure out which programs have the best potential to work well.
The eleven states in which Recidiviz is working include California, Colorado, Maine, Michigan, Missouri, Pennsylvania and Tennessee. And a pilot program launched last year in Idaho, if scaled nationally, with could reduce the number of people in the criminal justice system by a quarter of a million people, Jacoby says. As part of the pilot, rather than relying on manual calculations, Recidiviz is equipping leaders and the probation officers with actionable information with a few clicks of an app that Recidiviz built.
Mitchell is disappointed that there’s even the need for Recidiviz. “This is a problem that government agencies have a responsibility to address,” he says. “But they haven’t.” For one company to come along and fill such a large gap is “remarkable.”