Imagine if nearly a third of your DNA was composed of ancient viral remnants—genetic hitchhikers from infections that occurred millions of years ago.
Now imagine that these viral fragments aren’t just dead weight, but have quietly evolved to help control the way your genes work, potentially shaping what it means to be human.
That’s the startling insight behind a new study led by researchers at Kyoto University’s Institute for the Advanced Study of Human Biology (ASHBi) and McGill University.
Published in Science Advances, the team has uncovered dozens of previously unrecognized subfamilies of ancient viral DNA embedded in our genomes, some of which appear to play active roles in gene regulation and may have significantly influenced primate evolution.
“By applying our approach across 53 simian-enriched LTR subfamilies, we defined 75 new subfamilies and found a novel annotation for a total of 3807 (30.0%) instances from 26 subfamilies,” researchers write.
Primate DNA’s Viral Past
Approximately 8% of the human genome originates from endogenous retroviruses (ERVs)—these are viral sequences that have become a permanent part of our genetic material after being passed down over generations following infections that occurred millions of years ago.
These ancient viral remnants often remain dormant, suppressed by the host’s defense mechanisms. However, in specific contexts, particularly during early embryonic development, some of these sequences become active, interacting with host transcription factors to regulate genes.
A key feature of these ERVs is their long terminal repeats, or “LTRs”—repetitive sequences that flank either end of the viral DNA. In their original viral form, LTRs helped control the virus’s ability to insert itself into host genomes and express its genes.
However, after integration into the genome, many LTRs stuck around and were co-opted by the host as regulatory elements. Today, they function much like genetic switches, turning nearby genes on or off depending on the cellular context.
These LTRs are crucial regulatory hotspots. However, according to researchers, current methods for annotating and classifying them are deeply flawed. Traditional tools, such as RepeatMasker, often misclassify these sequences or overlook the nuanced differences between viral subfamilies.
To address this, the team employed a phylogenetic approach that integrated evolutionary sequence analysis with conservation patterns across 53 primate species.
Their focus began with one family of particularly young LTRs called MER11A/B/C, which had previously been crudely lumped into just a few categories.
What they found upended decades of assumptions.
Unearthing a Hidden Viral Substructure
By analyzing thousands of MER11 sequences across the human genome and comparing them with those of chimpanzees and macaques, the researchers identified four entirely new MER11 subfamilies: MER11_G1 through MER11_G4. These new classifications accounted for nearly 20% of all MER11 instances and were far more accurate in predicting biological activity.
Each new subfamily exhibited distinct evolutionary patterns and regulatory behaviors, with some retaining or acquiring transcription factor binding motifs that enable them to function like genetic switches.
“The four new subfamilies of MER11 appear to resolve the epigenetic heterogeneity within MER11 instances, and we found that relative age was associated with distinct regulatory profiles,” researchers write.
Ancient Viruses, Modern Function
To test whether these newly identified viral fragments had any biologically meaningful effects, the researchers employed a cutting-edge technique known as a massively parallel reporter assay (MPRA). This approach enables the simultaneous testing of thousands of DNA sequences for their ability to drive gene expression.
The results revealed that MER11_G2 and MER11_G3, two of the newly classified subfamilies, showed high levels of enhancer activity in human induced pluripotent stem cells. These viral sequences weren’t just passive fossils—they were actively influencing gene expression.
Moreover, the team identified specific single-nucleotide changes, including a single deletion that led to the formation of binding sites for SOX-related transcription factors.
These motifs were strongly associated with higher enhancer activity. Still, they were only present in humans and chimpanzees, not macaques—suggesting a recent and primate-specific evolutionary gain.
What makes these findings significant is their potential to explain how gene regulation evolved differently across primate species. While all primates carry ERVs, the evolution of these ERVs and their subsequent functions differ widely.
For instance, the team found that the youngest new subfamily, MER11_G4, had gained SOX-related motifs through a single-nucleotide deletion found in humans and chimpanzees—but not macaques. This discovery could potentially shift our understanding of gene regulation and evolution.
“Thus, these ape-specific SOX motifs in MER11 subfamilies may influence the gene regulatory network during development in a lineage-specific manner,” researchers note.
Beyond MER11, the researchers extended their phylogenetic reannotation across 18 other LTR groups, uncovering a total of 75 new subfamilies. In one notable case, the LTR7 group—previously linked to pluripotency in embryonic stem cells—was found to contain 12 distinct new subfamilies, each with unique epigenetic signatures.
Their analysis revealed that these new subfamilies frequently exhibited specific associations with key regulatory proteins, such as CTCF, ZNF808, and TEAD4, or were enriched for chromatin accessibility in particular cell types. One subfamily was even tied to gene regulation in trophoblast cells, which are essential for placenta development.
These insights could prove vital for understanding how regulatory elements contribute to species-specific traits, developmental processes, and even disease susceptibility.
These recent findings open up new lines of inquiry into how ancient viruses helped shape our biology. The distinct regulatory profiles uncovered by the team could help explain the differences in gene expression between humans and other primates, and offer clues into how our genome has adapted and evolved over time.
The study also reinforces the idea that so-called “junk DNA” isn’t junk at all. Instead, it’s a dynamic archive of evolutionary innovation—a genomic scrapbook of battles between hosts and viruses that, over millions of years, has been repurposed into tools for gene regulation.
The authors emphasize that their method can serve as a promising template for future research into other cryptic ERV subfamilies, offering a new direction for genetic studies.
“With this refined annotation of simian-enriched LTRs, it will be possible to better understand the evolution in primate genomes and potentially identify critical roles for ERVs and their LTRs in the hosts,“ researchers conclude.
A Molecular Arms Race
Lastly, the study touches on a deeper evolutionary arms race. Host genomes evolve proteins, such as KRAB zinc finger proteins, to silence rogue viral sequences. In turn, ERVs mutate to escape detection or become co-opted as regulatory tools. This tug-of-war leaves molecular fingerprints, motifs, binding sites, and sequence divergence that researchers can now decode with new precision.
Ultimately, these recent findings reveal that the viral ghosts in our DNA aren’t just relics of the past. They’re part of a dynamic system that continues to shape who we are—and perhaps who we might become.
“Our genome was sequenced long ago, but the function of many of its parts remain unknown,“ co-author and professor of molecular biology at Kyoto University, Dr. Fumitaka Inoue, said in a press release. “Transposable elements are thought to play important roles in genome evolution, and their significance is expected to become clearer as research continues to advance.”
Tim McMillan is a retired law enforcement executive, investigative reporter and co-founder of The Debrief. His writing typically focuses on defense, national security, the Intelligence Community and topics related to psychology. You can follow Tim on Twitter: @LtTimMcMillan. Tim can be reached by email: tim@thedebrief.org or through encrypted email: LtTimMcMillan@protonmail.com
