DARPA and NASA’s Jet Propulsion Lab Unite to Combat File Format Dangers with ‘SafeDocs’


Welcome to this week’s installment of The Intelligence Brief… in recent days, DARPA provided an update on the agency’s unique efforts to make computing safer through its innovative SafeDocs program. This week, we look at 1) what SafeDocs is and how it aims to reduce risks associated with the transmission of malware and other dangers, 2) why NASA’s Jet Propulsion Lab is helping DARPA with its unique mission, and 3) how JPL researchers were able to construct a massive corpus in their work with DARPA on the SafeDocs program.

Quote of the Week

“Today, electronic data is the attack surface.” 

– Dr. Sergey Bratus, DARPA’s SafeDocs program manager

Latest Stories: Before getting into our analysis this week, a few of the stories we’re covering at The Debrief includes Raytheon Technologies’ delivery of the first fully portable, combat-ready drone to the Air Force and how NASA just sent the fastest spacecraft ever built close enough to the Sun to detect the fine structure of “fast” solar winds near its surface. Also, as The Galileo Project’s expedition to retrieve fragments from an interstellar meteor are underway off the coast of Papua New Guinea, Avi Loeb has committed to sharing regular updates on the team’s progress with The Debrief in a special series, “Diary of an Interstellar Journey,” installments of which can be found in Part One, Part Two, Part Three, with additional installments forthcoming. As always, you can get links to all our latest stories at the end of this week’s newsletter.

Podcasts: This week in podcasts from The Debrief, in the latest episode of  The Debrief Weekly Report, former CNN and Al Jazeera news anchor and host of History Channel’s The Proof is Out There joins Stephanie Gerk and MJ Banias to discuss NASA’s UAP investigation and other recent news onMeanwhile, this week on The Micah Hanks Program, we continue our analysis of the controversy surrounding the claims of UAP whistleblower David Grusch. You can subscribe to all of The Debrief’s podcasts, including audio editions of Rebelliously Curious, by heading over to our Podcasts Page. 

Video News: On the latest installment of Rebelliously Curious, Chrissy Newton spoke with Leslie and Ralph Blumenthal, who discuss their recent breaking news story with The Debrief involving the claims of a former intelligence community insider. Also, be sure to check out the latest episode of our all-new series “Ask Dr. Chance,” where the scientist also weighs in on the science behind UFOs. Be sure to watch these videos and other great content from The Debrief on our official YouTube Channel.

With all the housekeeping out of the way, it’s time to look at how DARPA has been getting a little help from NASA scientists to work toward making online activities and document access more secure.

DARPA Announces its New SafeDocs Program

This week, the Defense Advanced Research Projects Agency (DARPA) reported on its Safe Documents (SafeDocs) program, an innovative initiative that provides “new methods and tools that allow people to confidently open documents and trust what they see on their screens.”

The DoD’s research and development agency tasked with the development of emerging technologies for the U.S. military, DARPA’s SafeDocs program, which formally began in 2018, aims to develop new verified programming capabilities and methods, which will assemble “high assurance parsers for extant electronic data formats” allowing comprehensive verification of online documents.

(Credit: DARPA)

“SafeDocs will address the ambiguity and complexity obstacles that hinder the application of verified programming posed by extant electronic data formats,” said Dr. Sergey Bratus, SafeDocs program manager, in a statement issued by the agency.

Mitigating potential risks presented with online documents may appear to be a slightly different area of focus for DARPA, the agency best known for its production of stealth technologies, as well as intelligence, surveillance, and reconnaissance capabilities for the U.S. military. But DARPA isn’t the only agency stepping outside its normal routine in the unique effort to help make online activities safer: they’re getting additional help from NASA’s Jet Propulsion Lab.

NASA’s JPL Joins the SafeDocs Effort

“As part of DARPA’s SafeDocs program, JPL data scientists have amassed 8 million PDFs that can now be used for further study in order to make the internet more secure,” read a JPL statement issued yesterday.

While JPL’s normal area of focus in recent years has included landing rovers like Perseverance on Mars, the Pasadena facility also devotes a significant amount of time and resources to its work related to the digital world.

“In support of a wider effort to make the internet more secure, JPL data scientists have created the largest single publicly available open-source archive, or corpus, of PDFs,” the NASA statement read. “By working with the nonprofit PDF Association, which seeks to establish open specifications and standards for the technology, JPL is helping to develop several tools to confront these challenges.”

(Credit: NASA/JPL)

JPL Data scientist Tim Allison said that despite their ubiquity, PDF documents are complex carriers of online information and thereby can easily be compromised, allowing them to potentially carry malicious code and other information that can be potentially compromised.

“To confront these and other challenges from PDFs, a large sample of real-world PDFs needs to be collected from the internet to create a shared, freely available resource for software experts,” Allison said in a statement.

Common Crawling

To construct the corpus Allison and the JPL research team developed, the team used Common Crawl, which is an open-source data set of web-crawl data, which they used to identify PDFs they judged should be included in the corpus.

Using special software to overcome the download size constraints of Common Crawl, Allison and the JPL scientists were able to fetch millions of PDFs and extract metadata, and also use geolocation software to help them identify the server location and online source of each individual PDF; a total data set that clocks in at being roughly 8 terabytes in size, and now the largest corpus of its kind ever made available to the public.

“PDF is one of the most important file types on the internet today, and this contribution of roughly 8 terabytes of data provides faculty, students, and corporations with up-to-date reference data that will power research for years to come,” said Simson Garfinkel, a former associate professor at the Naval Postgraduate School in Monterey, California, who conducted similar work in years past.

“DARPA and the PDF Association are helping standards organizations redefine software specifications and even standards development processes that could help mitigate billions of dollars in terms of loss of productivity caused by data breaches,” Dr. Sergey Bratus said in a statement.

“Through our collaborative efforts, we’ve shown the ability to eliminate the root cause of ambiguity, the place for the attackers to hide within the complexity of modern documents.” Additional information about the SafeDocs program can be found at DARPA’s website.

That concludes this week’s installment of The Intelligence Brief. You can read past editions of The Intelligence Brief at our website, or if you found this installment online, don’t forget to subscribe and get future email editions from us here. Also, if you have a tip or other information you’d like to send along directly to me, you can email me at micah [@] thedebrief [dot] org, or Tweet at me @MicahHanks.

Here are the top stories we’re covering right now…