The Shiny–and Potentially Dangerous—New Tool for Predicting Human Behavior

Studies of twins have played an important role in determining that genetic differences play a role in the development of differences in behavior.
[Editor's Note: This essay is in response to our current Big Question, which we posed to experts with different perspectives: "How should DNA tests for intelligence be used, if at all, by parents and educators?"]
Imagine a world in which pregnant women could go to the doctor and obtain a simple inexpensive genetic test of their unborn child that would allow them to predict how tall he or she would eventually be. The test might also tell them the child's risk for high blood pressure or heart disease.
Can we use DNA not to understand, but to predict who is going to be intelligent or extraverted or mentally ill?
Even more remarkable -- and more dangerous -- the test might predict how intelligent the child would be, or how far he or she could be expected to go in school. Or heading further out, it might predict whether he or she will be an alcoholic or a teetotaler, or straight or gay, or… you get the idea. Is this really possible? If it is, would it be a good idea? Answering these questions requires some background in a scientific field called behavior genetics.
Differences in human behavior -- intelligence, personality, mental illness, pretty much everything -- are related to genetic differences among people. Scientists have known this for 150 years, ever since Darwin's half-cousin Francis Galton first applied Shakespeare's phrase, "Nature and Nurture" to the scientific investigation of human differences. We knew about the heritability of behavior before Mendel's laws of genetics had been re-discovered at the end of the last century, and long before the structure of DNA was discovered in the 1950s. How could discoveries about genetics be made before a science of genetics even existed?
The answer is that scientists developed clever research designs that allowed them to make inferences about genetics in the absence of biological knowledge about DNA. The best-known is the twin study: identical twins are essentially clones, sharing 100 percent of their DNA, while fraternal twins are essentially siblings, sharing half. To the extent that identical twins are more similar for some trait than fraternal twins, one can infer that heredity is playing a role. Adoption studies are even more straightforward. Is the personality of an adopted child more like the biological parents she has never seen, or the adoptive parents who raised her?
Twin and adoption studies played an important role in establishing beyond any reasonable doubt that genetic differences play a role in the development of differences in behavior, but they told us very little about how the genetics of behavior actually worked. When the human genome was finally sequenced in the early 2000s, and it became easier and cheaper to obtain actual DNA from large samples of people, scientists anticipated that we would soon find the genes for intelligence, mental illness, and all the other behaviors that were known to be "heritable" in a general way.
But to everyone's amazement, the genes weren't there. It turned out that there are thousands of genes related to any given behavior, so many that they can't be counted, and each one of them has such a tiny effect that it can't be tied to meaningful biological processes. The whole scientific enterprise of understanding the genetics of behavior seemed ready to collapse, until it was rescued -- sort of -- by a new method called polygenic scores, PGS for short. Polygenic scores abandon the old task of finding the genes for complex human behavior, replacing it with black-box prediction: can we use DNA not to understand, but to predict who is going to be intelligent or extraverted or mentally ill?
Prediction from observing parents works better, and is far easier and cheaper, than anything we can do with DNA.
PGS are the shiny new toy of human genetics. From a technological standpoint they are truly amazing, and they are useful for some scientific applications that don't involve making decisions about individual people. We can obtain DNA from thousands of people, estimate the tiny relationships between individual bits of DNA and any outcome we want — height or weight or cardiac disease or IQ — and then add all those tiny effects together into a single bell-shaped score that can predict the outcome of interest. In theory, we could do this from the moment of conception.
Polygenic scores for height already work pretty well. Physicians are debating whether the PGS for heart disease are robust enough to be used in the clinic. For some behavioral traits-- the most data exist for educational attainment -- they work well enough to be scientifically interesting, if not practically useful. For traits like personality or sexual orientation, the prediction is statistically significant but nowhere close to practically meaningful. No one knows how much better any of these predictions are likely to get.
Without a doubt, PGS are an amazing feat of genomic technology, but the task they accomplish is something scientists have been able to do for a long time, and in fact it is something that our grandparents could have done pretty well. PGS are basically a new way to predict a trait in an individual by using the same trait in the individual's parents — a way of observing that the acorn doesn't fall far from the tree.
The children of tall people tend to be tall. Children of excellent athletes are athletic; children of smart people are smart; children of people with heart disease are at risk, themselves. Not every time, of course, but that is how imperfect prediction works: children of tall parents vary in their height like anyone else, but on average they are taller than the rest of us. Prediction from observing parents works better, and is far easier and cheaper, than anything we can do with DNA.
But wait a minute. Prediction from parents isn't strictly genetic. Smart parents not only pass on their genes to their kids, but they also raise them. Smart families are privileged in thousands of ways — they make more money and can send their kids to better schools. The same is true for PGS.
The ability of a genetic score to predict educational attainment depends not only on examining the relationship between certain genes and how far people go in school, but also on every personal and social characteristic that helps or hinders education: wealth, status, discrimination, you name it. The bottom line is that for any kind of prediction of human behavior, separation of genetic from environmental prediction is very difficult; ultimately it isn't possible.
Still, experts are already discussing how to use PGS to make predictions for children, and even for embryos.
This is a reminder that we really have no idea why either parents or PGS predict as well or as poorly as they do. It is easy to imagine that a PGS for educational attainment works because it is summarizing genes that code for efficient neurological development, bigger brains, and swifter problem solving, but we really don't know that. PGS could work because they are associated with being rich, or being motivated, or having light skin. It's the same for predicting from parents. We just don't know.
Still, experts are already discussing how to use PGS to make predictions for children, and even for embryos.
For example, maybe couples could fertilize multiple embryos in vitro, test their DNA, and select the one with the "best" PGS on some trait. This would be a bad idea for a lot of reasons. Such scores aren't effective enough to be very useful to parents, and to the extent they are effective, it is very difficult to know what other traits might be selected for when parents try to prioritize intelligence or attractiveness. People will no doubt try it anyway, and as a matter of reproductive freedom I can't think of any way to stop them. Fortunately, the practice probably won't have any great impact one way or another.
That brings us to the ethics of PGS, particularly in the schools. Imagine that when a child enrolls in a public school, an IQ test is given to her biological parents. Children with low-IQ parents are statistically more likely to have low IQs themselves, so they could be assigned to less demanding classrooms or vocational programs. Hopefully we agree that this would be unethical, but let's think through why.
First of all, it would be unethical because we don't know why the parents have low IQs, or why their IQs predict their children's. The parents could be from a marginalized ethnic group, recognizable by their skin color and passed on genetically to their children, so discriminating based on a parent's IQ would just be a proxy for discriminating based on skin color. Such a system would be no more than a social scientific gloss on an old-fashioned program for perpetuating economic and cognitive privilege via the educational system.
People deserve to be judged on the basis of their own behavior, not a genetic test.
Assigning children to classrooms based on genetic testing would be no different, although it would have the slight ethical advantage of being less effective. The PGS for educational attainment could reflect brain-efficiency, but it could also depend on skin color, or economic advantage, or personality, or literally anything that is related in any way to economic success. Privileging kids with higher genetic scores would be no different than privileging children with smart parents. If schools really believe that a psychological trait like IQ is important for school placement, the sensible thing is to administer the children an actual IQ test – not a genetic test.
IQ testing has its own issues, of course, but at least it involves making decisions about individuals based on their own observable characteristics, rather than on characteristics of their parents or their genome. If decisions must be made, if resources must be apportioned, people deserve to be judged on the basis of their own behavior, the content of their character. Since it can't be denied that people differ in all sorts of relevant ways, this is what it means for all people to be created equal.
[Editor's Note: Read another perspective in the series here.]
Massive benefits of AI come with environmental and human costs. Can AI itself be part of the solution?
Generative AI has a large carbon footprint and other drawbacks. But AI can help mitigate its own harms—by plowing through mountains of data on extreme weather and human displacement.
The recent explosion of generative artificial intelligence tools like ChatGPT and Dall-E enabled anyone with internet access to harness AI’s power for enhanced productivity, creativity, and problem-solving. With their ever-improving capabilities and expanding user base, these tools proved useful across disciplines, from the creative to the scientific.
But beneath the technological wonders of human-like conversation and creative expression lies a dirty secret—an alarming environmental and human cost. AI has an immense carbon footprint. Systems like ChatGPT take months to train in high-powered data centers, which demand huge amounts of electricity, much of which is still generated with fossil fuels, as well as water for cooling. “One of the reasons why Open AI needs investments [to the tune of] $10 billion from Microsoft is because they need to pay for all of that computation,” says Kentaro Toyama, a computer scientist at the University of Michigan. There’s also an ecological toll from mining rare minerals required for hardware and infrastructure. This environmental exploitation pollutes land, triggers natural disasters and causes large-scale human displacement. Finally, for data labeling needed to train and correct AI algorithms, the Big Data industry employs cheap and exploitative labor, often from the Global South.
Generative AI tools are based on large language models (LLMs), with most well-known being various versions of GPT. LLMs can perform natural language processing, including translating, summarizing and answering questions. They use artificial neural networks, called deep learning or machine learning. Inspired by the human brain, neural networks are made of millions of artificial neurons. “The basic principles of neural networks were known even in the 1950s and 1960s,” Toyama says, “but it’s only now, with the tremendous amount of compute power that we have, as well as huge amounts of data, that it’s become possible to train generative AI models.”
Though there aren’t any official figures about the power consumption or emissions from data centers, experts estimate that they use one percent of global electricity—more than entire countries.
In recent months, much attention has gone to the transformative benefits of these technologies. But it’s important to consider that these remarkable advances may come at a price.
AI’s carbon footprint
In their latest annual report, 2023 Landscape: Confronting Tech Power, the AI Now Institute, an independent policy research entity focusing on the concentration of power in the tech industry, says: “The constant push for scale in artificial intelligence has led Big Tech firms to develop hugely energy-intensive computational models that optimize for ‘accuracy’—through increasingly large datasets and computationally intensive model training—over more efficient and sustainable alternatives.”
Though there aren’t any official figures about the power consumption or emissions from data centers, experts estimate that they use one percent of global electricity—more than entire countries. In 2019, Emma Strubell, then a graduate researcher at the University of Massachusetts Amherst, estimated that training a single LLM resulted in over 280,000 kg in CO2 emissions—an equivalent of driving almost 1.2 million km in a gas-powered car. A couple of years later, David Patterson, a computer scientist from the University of California Berkeley, and colleagues, estimated GPT-3’s carbon footprint at over 550,000 kg of CO2 In 2022, the tech company Hugging Face, estimated the carbon footprint of its own language model, BLOOM, as 25,000 kg in CO2 emissions. (BLOOM’s footprint is lower because Hugging Face uses renewable energy, but it doubled when other life-cycle processes like hardware manufacturing and use were added.)
Luckily, despite the growing size and numbers of data centers, their increasing energy demands and emissions have not kept pace proportionately—thanks to renewable energy sources and energy-efficient hardware.
But emissions don’t tell the full story.
AI’s hidden human cost
“If historical colonialism annexed territories, their resources, and the bodies that worked on them, data colonialism’s power grab is both simpler and deeper: the capture and control of human life itself through appropriating the data that can be extracted from it for profit.” So write Nick Couldry and Ulises Mejias, authors of the book The Costs of Connection.
The energy requirements, hardware manufacture and the cheap human labor behind AI systems disproportionately affect marginalized communities.
Technologies we use daily inexorably gather our data. “Human experience, potentially every layer and aspect of it, is becoming the target of profitable extraction,” Couldry and Meijas say. This feeds data capitalism, the economic model built on the extraction and commodification of data. While we are being dispossessed of our data, Big Tech commodifies it for their own benefit. This results in consolidation of power structures that reinforce existing race, gender, class and other inequalities.
“The political economy around tech and tech companies, and the development in advances in AI contribute to massive displacement and pollution, and significantly changes the built environment,” says technologist and activist Yeshi Milner, who founded Data For Black Lives (D4BL) to create measurable change in Black people’s lives using data. The energy requirements, hardware manufacture and the cheap human labor behind AI systems disproportionately affect marginalized communities.
AI’s recent explosive growth spiked the demand for manual, behind-the-scenes tasks, creating an industry described by Mary Gray and Siddharth Suri as “ghost work” in their book. This invisible human workforce that lies behind the “magic” of AI, is overworked and underpaid, and very often based in the Global South. For example, workers in Kenya who made less than $2 an hour, were the behind the mechanism that trained ChatGPT to properly talk about violence, hate speech and sexual abuse. And, according to an article in Analytics India Magazine, in some cases these workers may not have been paid at all, a case for wage theft. An exposé by the Washington Post describes “digital sweatshops” in the Philippines, where thousands of workers experience low wages, delays in payment, and wage theft by Remotasks, a platform owned by Scale AI, a $7 billion dollar American startup. Rights groups and labor researchers have flagged Scale AI as one company that flouts basic labor standards for workers abroad.
It is possible to draw a parallel with chattel slavery—the most significant economic event that continues to shape the modern world—to see the business structures that allow for the massive exploitation of people, Milner says. Back then, people got chocolate, sugar, cotton; today, they get generative AI tools. “What’s invisible through distance—because [tech companies] also control what we see—is the massive exploitation,” Milner says.
“At Data for Black Lives, we are less concerned with whether AI will become human…[W]e’re more concerned with the growing power of AI to decide who’s human and who’s not,” Milner says. As a decision-making force, AI becomes a “justifying factor for policies, practices, rules that not just reinforce, but are currently turning the clock back generations years on people’s civil and human rights.”
Ironically, AI plays an important role in mitigating its own harms—by plowing through mountains of data about weather changes, extreme weather events and human displacement.
Nuria Oliver, a computer scientist, and co-founder and vice-president of the European Laboratory of Learning and Intelligent Systems (ELLIS), says that instead of focusing on the hypothetical existential risks of today’s AI, we should talk about its real, tangible risks.
“Because AI is a transverse discipline that you can apply to any field [from education, journalism, medicine, to transportation and energy], it has a transformative power…and an exponential impact,” she says.
AI's accountability
“At the core of what we were arguing about data capitalism [is] a call to action to abolish Big Data,” says Milner. “Not to abolish data itself, but the power structures that concentrate [its] power in the hands of very few actors.”
A comprehensive AI Act currently negotiated in the European Parliament aims to rein Big Tech in. It plans to introduce a rating of AI tools based on the harms caused to humans, while being as technology-neutral as possible. That sets standards for safe, transparent, traceable, non-discriminatory, and environmentally friendly AI systems, overseen by people, not automation. The regulations also ask for transparency in the content used to train generative AIs, particularly with copyrighted data, and also disclosing that the content is AI-generated. “This European regulation is setting the example for other regions and countries in the world,” Oliver says. But, she adds, such transparencies are hard to achieve.
Google, for example, recently updated its privacy policy to say that anything on the public internet will be used as training data. “Obviously, technology companies have to respond to their economic interests, so their decisions are not necessarily going to be the best for society and for the environment,” Oliver says. “And that’s why we need strong research institutions and civil society institutions to push for actions.” ELLIS also advocates for data centers to be built in locations where the energy can be produced sustainably.
Ironically, AI plays an important role in mitigating its own harms—by plowing through mountains of data about weather changes, extreme weather events and human displacement. “The only way to make sense of this data is using machine learning methods,” Oliver says.
Milner believes that the best way to expose AI-caused systemic inequalities is through people's stories. “In these last five years, so much of our work [at D4BL] has been creating new datasets, new data tools, bringing the data to life. To show the harms but also to continue to reclaim it as a tool for social change and for political change.” This change, she adds, will depend on whose hands it is in.
DNA gathered from animal poop helps protect wildlife
Alida de Flamingh and her team are collecting elephant dung. It holds a trove of information about animal health, diet and genetic diversity.
On the savannah near the Botswana-Zimbabwe border, elephants grazed contentedly. Nearby, postdoctoral researcher Alida de Flamingh watched and waited. As the herd moved away, she went into action, collecting samples of elephant dung that she and other wildlife conservationists would study in the months to come. She pulled on gloves, took a swab, and ran it all over the still-warm, round blob of elephant poop.
Sequencing DNA from fecal matter is a safe, non-invasive way to track and ultimately help protect over 42,000 species currently threatened by extinction. Scientists are using this DNA to gain insights into wildlife health, genetic diversity and even the broader environment. Applied to elephants, chimpanzees, toucans and other species, it helps scientists determine the genetic diversity of groups and linkages with other groups. Such analysis can show changes in rates of inbreeding. Populations with greater genetic diversity adapt better to changes and environmental stressors than those with less diversity, thus reducing their risks of extinction, explains de Flamingh, a postdoctoral researcher at the University of Illinois Urbana-Champaign.
Analyzing fecal DNA also reveals information about an animal’s diet and health, and even nearby flora that is eaten. That information gives scientists broader insights into the ecosystem, and the findings are informing conservation initiatives. Examples include restoring or maintaining genetic connections among groups, ensuring access to certain foraging areas or increasing diversity in captive breeding programs.
Approximately 27 percent of mammals and 28 percent of all assessed species are close to dying out. The IUCN Red List of threatened species, simply called the Red List, is the world’s most comprehensive record of animals’ risk of extinction status. The more information scientists gather, the better their chances of reducing those risks. In Africa, populations of vertebrates declined 69 percent between 1970 and 2022, according to the World Wildlife Fund (WWF).
“We put on sterile gloves and use a sterile swab to collect wet mucus and materials from the outside of the dung ball,” says Alida de Flamingh, a postdoctoral researcher at the University of Illinois Urbana-Champaign.
“When people talk about species, they often talk about ecosystems, but they often overlook genetic diversity,” says Christina Hvilsom, senior geneticist at the Copenhagen Zoo. “It’s easy to count (individuals) to assess whether the population size is increasing or decreasing, but diversity isn’t something we can see with our bare eyes. Yet, it’s actually the foundation for the species and populations.” DNA analysis can provide this critical information.
Assessing elephants’ health
“Africa’s elephant populations are facing unprecedented threats,” says de Flamingh, the postdoc, who has studied them since 2009. Challenges include ivory poaching, habitat destruction and smaller, more fragmented habitats that result in smaller mating pools with less genetic diversity. Additionally, de Flamingh studies the microbial communities living on and in elephants – their microbiomes – looking for parasites or dangerous microbes.
Approximately 415,000 elephants inhabit Africa today, but de Flamingh says the number would be four times higher without these challenges. The IUCN Red List reports African savannah elephants are endangered and African forest elephants are critically endangered. Elephants support ecosystem biodiversity by clearing paths that help other species travel. Their very footprints create small puddles that can host smaller organisms such as tadpoles. Elephants are often described as ecosystems’ engineers, so if they disappear, the rest of the ecosystem will suffer too.
There’s a process to collecting elephant feces. “We put on sterile gloves (which we change for each sample) and use a sterile swab to collect wet mucus and materials from the outside of the dung ball,” says de Flamingh. They rub a sample about the size of a U.S. quarter onto a paper card embedded with DNA preservation technology. Each card is air dried and stored in a packet of desiccant to prevent mold growth. This way, samples can be stored at room temperature indefinitely without the DNA degrading.
Earlier methods required collecting dung in bags, which needed either refrigeration or the addition of preservatives, or the riskier alternative of tranquilizing the animals before approaching them to draw blood samples. The ability to collect and sequence the DNA made things much easier and safer.
“Our research provides a way to assess elephant health without having to physically interact with elephants,” de Flamingh emphasizes. “We also keep track of the GPS coordinates of each sample so that we can create a map of the sampling locations,” she adds. That helps researchers correlate elephants’ health with geographic areas and their conditions.
Although de Flamingh works with elephants in the wild, the contributions of zoos in the United States and collaborations in South Africa (notably the late Professor Rudi van Aarde and the Conservation Ecology Research Unit at the University of Pretoria) were key in studying this method to ensure it worked, she points out.
Protecting chimpanzees
Genetic work with chimpanzees began about a decade ago. Hvilsom and her group at the Copenhagen Zoo analyzed DNA from nearly 1,000 fecal samples collected between 2003 and 2018 by a team of international researchers. The goal was to assess the status of the West African subspecies, which is critically endangered after rapid population declines. Of the four subspecies of chimpanzees, the West African subspecies is considered the most at-risk.
In total, the WWF estimates the numbers of chimpanzees inhabiting Africa’s forests and savannah woodlands at between 173,000 and 300,000. Poaching, disease and human-caused changes to their lands are their major risks.
By analyzing genetics obtained from fecal samples, Hvilsom estimated the chimpanzees’ population, ascertained their family relationships and mapped their migration routes.
“One of the threats is mining near the Nimba Mountains in Guinea,” a stronghold for the West African subspecies, Hvilsom says. The Nimba Mountains are a UNESCO World Heritage Site, but they are rich in iron ore, which is used to make the steel that is vital to the Asian construction boom. As she and colleagues wrote in a recent paper, “Many extractive industries are currently developing projects in chimpanzee habitat.”
Analyzing DNA allows researchers to identify individual chimpanzees more accurately than simply observing them, she says. Normally, field researchers would install cameras and manually inspect each picture to determine how many chimpanzees were in an area. But, Hvilsom says, “That’s very tricky. Chimpanzees move a lot and are fast, so it’s difficult to get clear pictures. Often, they find and destroy the cameras. Also, they live in large areas, so you need a lot of cameras.”
By analyzing genetics obtained from fecal samples, Hvilsom estimated the chimpanzees’ population, ascertained their family relationships and mapped their migration routes based upon DNA comparisons with other chimpanzee groups. The mining companies and builders are using this information to locate future roads where they won’t disrupt migration – a more effective solution than trying to build artificial corridors for wildlife.
“The current route cuts off communities of chimpanzees,” Hvilsom elaborates. That effectively prevents young adult chimps from joining other groups when the time comes, eventually reducing the currently-high levels of genetic diversity.
“The mining company helped pay for the genetics work,” Hvilsom says, “as part of its obligation to assess and monitor biodiversity and the effect of the mining in the area.”
Of 50 toucan subspecies, 11 are threatened or near-threatened with extinction because of deforestation and poaching.
Identifying toucan families
Feces aren't the only substance researchers draw DNA samples from. Jeffrey Coleman, a Ph.D. candidate at the University of Texas at Austin relies on blood tests for studying the genetic diversity of toucans---birds species native to Central America and nearby regions. They live in the jungles, where they hop among branches, snip fruit from trees, toss it in the air and catch it with their large beaks. “Toucans are beautiful, charismatic birds that are really important to the ecosystem,” says Coleman.
Of their 50 subspecies, 11 are threatened or near-threatened with extinction because of deforestation and poaching. “When people see these aesthetically pleasing birds, they’re motivated to care about conservation practices,” he points out.
Coleman works with the Dallas World Aquarium and its partner zoos to analyze DNA from blood draws, using it to identify which toucans are related and how closely. His goal is to use science to improve the genetic diversity among toucan offspring.
Specifically, he’s looking at sections of the genome of captive birds in which the nucleotides repeat multiple times, such as AGATAGATAGAT. Called microsatellites, these consecutively-repeating sections can be passed from parents to children, helping scientists identify parent-child and sibling-sibling relationships. “That allows you to make strategic decisions about how to pair (captive) individuals for mating...to avoid inbreeding,” Coleman says.
Jeffrey Coleman is studying the microsatellites inside the toucan genomes.
Courtesy Jeffrey Coleman
The alternative is to use a type of analysis that looks for a single DNA building block – a nucleotide – that differs in a given sequence. Called single nucleotide polymorphisms (SNPs, pronounced “snips”), they are very common and very accurate. Coleman says they are better than microsatellites for some uses. But scientists have already developed a large body of microsatellite data from multiple species, so microsatellites can shed more insights on relations.
Regardless of whether conservation programs use SNPs or microsatellites to guide captive breeding efforts, the goal is to help them build genetically diverse populations that eventually may supplement endangered populations in the wild. “The hope is that the ecosystem will be stable enough and that the populations (once reintroduced into the wild) will be able to survive and thrive,” says Coleman. History knows some good examples of captive breeding success.
The California condor, which had a total population of 27 in 1987, when the last wild birds were captured, is one of them. A captive breeding program boosted their numbers to 561 by the end of 2022. Of those, 347 of those are in the wild, according to the National Park Service.
Conservationists hope that their work on animals’ genetic diversity will help preserve and restore endangered species in captivity and the wild. DNA analysis is crucial to both types of efforts. The ability to apply genome sequencing to wildlife conservation brings a new level of accuracy that helps protect species and gives fresh insights that observation alone can’t provide.
“A lot of species are threatened,” Coleman says. “I hope this research will be a resource people can use to get more information on longer-term genealogies and different populations.”