Over the decades, the art of spying–especially when carried out by government agencies–has evolved dramatically thanks to advances in technology. Gone are the days of morse code or enigma machines, as now people use a computer, or even a smartphone, to spy on nearly anybody in the world.
For this reason, spying has turned into an evolutionary arms race as advanced technology like AI and machine learning are becoming more involved. Governments who utilize spies collect summaries of their findings in intelligence documents, which often contain state secrets or other classified information. And now, thanks to a new software tool, English speakers have the ability to locate text and speech in “documents” written in a variety of different languages, the results of which are then summarized in English.
Background: Program Material
Within the U.S. government exists the IARPA (Intelligence Advanced Research Projects Activity) group. Their mission is to “push the boundaries of science” to improve national security and help the intelligence community, according to their website. Their areas of interest range from machine learning and artificial intelligence to quantum computing and synthetic biology. The IARPA hosts various programs as well, split into the Office of Analysis and Office of Collection, both working to improve data collection and analysis for the intelligence community.
One of the programs in the Office of Analysis is “Machine Translation for English Retrieval of Information in any Language” (MATERIAL). This program began in 2017 looking for a way to overcome the language barrier within intelligence documents. Its main goal was to build a: “Cross-Language Information Retrieval (CLIR) system, that finds speech and text content in diverse lower-resource languages, using English search queries,” as their website states. Only recently has the MATERIAL program finally delivered this impressive tool.
Analysis: Queries and Keywords
The CLIR tool was developed by the MATERIAL program in collaboration with Raytheon BBN Technologies. The tool allows English-speakers to enter a query in English, such as specific keywords, which the tool then uses to search through foreign language documents and recordings to find the most relevant results. The results are then translated back into English before being presented to the user. Because the tool uses a machine-learning algorithm, it needed inputted data to function. The inputted data included the languages of Kazakh, Somali, Swahili, Tagalog, and Pashto. The tool was then tested against Georgian, Lithuanian, Bulgarian, and Farsi. Because languages range in tone, quality, inflection, and many other factors, testing the tool against a wide variety of languages showed its success.
Outlook: A More Efficient Way to Study Intelligence Documents
“The tools and techniques developed under the program will boost our ability to find, examine, and analyze foreign language content without needing to learn the language,” explained IARPA MATERIAL Program Manager Carl Rubino. Removing the time needed to learn a specific language, or the need for a translator gives the intelligence community faster and more efficient processes when looking at intelligence documents. This new tool is just another example of how advanced technology like machine learning has a large impact in this industry. No doubt technology like this will continue to be used as intelligence gathering and spying become more tech-focused.
Kenna Castleberry is a staff writer at the Debrief and the Science Communicator at JILA (a partnership between the University of Colorado Boulder and NIST). She focuses on deep tech, the metaverse, and quantum technology. You can find more of her work at her website: https://kennacastleberry.com/