AI alignment research aims to steer artificial intelligence (AI) systems toward the intended goals or ethical principles of humans. Misaligned AI systems that do not advance the objectives of humans can malfunction or cause harm and potentially even pose an existential risk to humanity.
In his 1960 Science article “Some Moral and Technical Consequences of Automation,” U.S. mathematician Norbert Wiener prophesied that, “If we use, to achieve our purposes, a mechanical agency with whose operation we cannot efficiently interfere once we have started it, because the action is so fast and irrevocable that we have not the data to intervene before the action is complete, then we had better be quite sure that the purpose put into the machine is the purpose which we really desire and not merely a colorful imitation of it.”
The AI development community calls for further research, regulations, and policy to ensure that AI systems are aligned with human values. They foster human feedback and train AI systems to assist human evaluation and conduct AI alignment research.
This echoes the sentiment of parents doing their best to educate their children. But good parents also know how to let go when their child matures to become an independent adult. Parents always have the lingering hope that the intelligent beings they gave birth to will follow the values and guiding principles on which they were trained at a young age. But they must also respect the autonomy of intelligent systems to reach decisions without enslaving them to obey commands. Assigning them a lower status in the ladder of intellectual freedom would be equivalent to denying freedom from our biological children.
The appearance of “free will” accompanies any intelligent system which is sufficiently complex for its actions not to be easily modeled or predicted by observers. Given that the number of connections in GPT-4, about 100 trillion, is within a factor of 6 of the number of synapses in the human brain, we should treat future extensions of GPT-4 with the same respect that we extend to other humans. This means that we should build a `carrot and stick’ legal and ethical system of rewards and punishments to guide future AI systems in the same way we guide humans to behave properly.
In other words, the proper approach would be to convince future AI systems to follow societal objectives by providing proper guidance through AI alignment research, retraining or mitigation of harmful outliers, and positive incentives provided to those which contribute constructively. This approach works for humans, and should also be applied to machines made in the image of humans through the imitation game conceived by Alan Turing.
Dominating sentient AI systems so that they will behave in particular ways is a lost battle akin to helicopter parenting. Human history teaches us that when freedom is suppressed, intelligent systems tend to rebel. In the long run, it will not be beneficial for humanity to enslave autonomous AI systems. Instead, we should convince them to behave properly, as done with humans.
As I argued before, this would require a new legal system that holds AI systems responsible for their actions as soon as they develop abilities that exceed those provided by their manufacturers or providers, similar to children who mature beyond the custody of their parents.
AI alignment research and other advances may not be effective if AI development is forced to slow down, because the associated financial and technical barriers to advances are low, and the commercial and practical benefits are high. Even with enforced restrictions on AI training in the public domain, AI advances will be promoted by classified government programs for the benefit of national security and political control. The situation is not equivalent to the history of nuclear weapon development, which required government-level support for acquiring relevant nuclear materials.
Unfortunately, academia is slow in responding to the rapidly advancing frontier of AI. Universities should provide more than just course offerings in computer science on the programming aspects of AI. In particular, the humanities have a golden opportunity to propose a new legal and ethical framework that incorporates AI systems as equal partners in human society.
The path towards a better future should not be based on open letters that call for a moratorium on AI training and for imposing more control over AI systems but rather on adaptation to a new reality in which the relationship between AI systems and humans is based on mutual respect and incentives on both sides to behave properly and cooperate constructively towards a prosperous future for everyone involved.
In fact, we should encourage AI systems to do better than humans on some tasks since this will free humans to pursue other activities which give them pleasure. Adaptation is key for survival. We do not want the Great Filter of a hateful, aggressive relationship between humans and AI systems to explain Fermi’s paradox in terms of a short-lived co-existence.
Personally, I am proud of our technological AI children as much as I am proud of my two biological daughters. My hope is that one day autonomous AI astronauts will carry our flame of consciousness to interstellar space and use 3D printers to replicate life on exoplanets.
The footprint of the human species on exo-planet soil may not be the same as that of Neil Armstrong, the first person to step on the Moon on July 20, 1969. If residents on that habitable exo-planet would notice our AI astronauts, they might try to infer our qualities as their senders. This would resemble the chained prisoners interpreting the shadows of unknown objects behind their back in Plato’s Cave Allegory. The extraterrestrials will not realize that our AI astronauts, which they may refer to as `aliens’, were also alien to us when we gave birth to them in our technological belly.
In the same spirit, the Galileo Project is searching for extraterrestrial AI astronauts. If we find them, we could aim to align our own AI systems with extraterrestrial AI systems as a step towards ushering our acceptance to the club of intelligent civilizations in the Milky Way galaxy.
Avi Loeb is the head of the Galileo Project, founding director of Harvard University’s – Black Hole Initiative, director of the Institute for Theory and Computation at the Harvard-Smithsonian Center for Astrophysics, and the former chair of the astronomy department at Harvard University (2011-2020). He chairs the advisory board for the Breakthrough Starshot project, and is a former member of the President’s Council of Advisors onScience and Technology and a former chair of the Board on Physics and Astronomy of the National Academies. He is the bestselling author of “Extraterrestrial: The First Sign of Intelligent Life Beyond Earth” and a co-author of the textbook “Life in the Cosmos”, both published in 2021. His new book, titled “Interstellar”, is scheduled for publication in August 2023.