Artificial intelligence (AI) researchers in New York have achieved another milestone with their development of synthetic ‘genome’ compression, allowing artificial neural networks to learn and evolve in a manner that mirrors the processes of living creatures, opening up new possibilities in AI research.
In nature, the fittest survive to pass on their traits and behaviors to the next generation through their DNA. At Cold Springs Harbor Laboratory (CSHL), Professors Anthony Zador and Alex Koulakov solved for the exponential compression rate of the human genome to develop a ‘nested’ AI that can perform well from initialization like an instinctual animal.
The Compressed Animal Potential
Shortly after entering the world, animals often perform amazing feats like swimming, flying, spinning webs, and countless other activities. To instinctively know how to perform these tasks, the genomes of these creatures have to store millions of neural connections required to perform these actions. Yet somehow, the size of the genome is orders of magnitude too small to transfer this dense information at a 1:1 ratio. Scientists have long puzzled over how this knowledge passes through the “genomic bottleneck”, a term used to describe the limitation in the amount of information that can be transferred by the genome.
Furthermore, there is no real distinction between these innate skills and what an animal learns. Learning builds on instinct, and the two play off each other, aiding survival. AI researchers have long studied how humans and other animals learn, yet they have ignored instinct, a crucial element in behavior and learning. The CSHL team embraced studying genomicly transferred knowledge to push AI learning and data compression forward.
“What if the genome’s limited capacity is the very thing that makes us so smart?” Koulakov asked. “What if it’s a feature, not a bug?”
Storing The Cortex
To map out the cortex thoroughly, it would require five to six times the data storage available in the human genome. This bandwidth scarcity implies that the neural connections are not explicitly mapped, only the rules that result in building the neural structure. The CSHL team looked at how to reduce complex neural networks to simple rules for connectivity in arbitrarily large structures. According to the new paper, a rule like “connect to your four nearest neighbors” could be continued infinitely instead of mapping out an entire system where this could occur.
While some of this compression is understood, exactly how to use it to create instinctual behavior is another matter. These rules can build infinitely stable and remarkably complex structures, but they cannot tell a spider how to spin a web or tell an animal what to fear. The team’s focus became discerning how these simple rules can be used to transmit not just general but specific information.
“The brain’s cortical architecture can fit about 280 terabytes of information—32 years of high-definition video. Our genomes accommodate about one hour. This implies a 400,000-fold compression technology cannot yet match,” said Koulakov.
Seeking Simplicity in Complexity
Instead of neurons and synapses, an artificial neural network (ANN) is a connection of nodes linked by weights. Essentially, these ANN terms represent digital analogies of the human cortex network. The CSHL team took the metaphor a step further, developing a “genome” to automatically lay out the weights, allowing the ANN to perform much better from initialization before training, mimicking an animal’s instinctual behaviors. This constraint would theoretically lead to a more regularized network, meaning this “bug” would become a “feature.”
The ANN the CSHL team devised from these premises consisted of two nested loops connected by a genome to transfer information. Usually, AI networks develop organically into sprawling, convoluted systems. The CSHL network joined the two loops by a tightly constrained “genome” information bottleneck so that the learning algorithm could only pass on the tightest and most efficient solutions. The genome led to a highly regularized and optimized network evolving like a natural species, passing on its best traits.
Genome Results
Thorough testing demonstrated initial performance very close to that of a well-developed network. Efforts to prune exceedingly convoluted networks showed that many nodes have few connections, with others performing much of the work. Their work demonstrates the utility of pursuing enhanced network initialization, often over-overlooked in favor of solely studying learning.
However, it is an essential caveat that while scientists have little argument over some animal skills like swimming being passed on genomics, there is considerably more debate about whether the same is true for more complex human behaviors that AI attempts to replicate, like language.
CSHL’s work is only the first step, with much work testing other variants of the genome constraints. Eventually, future researchers could reconcile the technique with more advanced learning studies to optimize the wiring for the learning rules.
The paper “Encoding Innate Ability Through a Genomic Bottleneck” appeared on September 12, 2024 in the Proceedings of the National Academy of Sciences.
Ryan Whalen covers science and technology for The Debrief. He holds a BA in History and a Master of Library and Information Science with a certificate in Data Science. He can be contacted at ryan@thedebrief.org, and follow him on Twitter @mdntwvlf.