AI Literacy
(Credit: Unsplash)

Think Your Student Can Pass an AI Literacy Test? A Concerning New Study Says Otherwise

In an era where artificial intelligence tools are becoming as common in classrooms as textbooks, a new study suggests most students don’t actually know how to use them well. 

Despite the widespread adoption of generative AI systems like ChatGPT, Gemini, and Claude, researchers have found that university students overestimate their ability to engage with these technologies—and that illusion of competence could have real-world consequences.

Set to be published in the December 2025 issue of Computers and Education: Artificial Intelligence, a paper by Monash University researchers introduces the Generative AI Literacy Assessment Test (GLAT). This pioneering exam is the first of its kind, designed to not only evaluate students’ ability to use generative AI tools but also their capacity to comprehend and ethically apply them.

While nearly all study participants claimed some level of proficiency with AI chatbots, only those with high GLAT scores were able to successfully navigate tasks that required analyzing complex data using AI support. In contrast, self-reported expertise had little to no predictive power.

This finding suggests a critical gap between perceived AI literacy and actual competency—one with implications not just for education but for national security, workforce preparedness, and the growing information warfare landscape.

At its core, the study exposes a new kind of digital divide—not between those with or without access to AI, but between those who understand how these systems function and those who are flying blind.

“GLAT offers a reliable and valid method for assessing GenAI literacy,” the study’s authors wrote. “[It has]the potential to inform educational practices and policy decisions that aim to enhance learners’ and educators’ GenAI literacy, ultimately equipping them to navigate an AI-enhanced future.

GLAT stands apart from other literacy assessments because it doesn’t just ask students how confident they feel with AI tools. Instead, it requires them to demonstrate skills across four critical areas: 

  • Understanding foundational concepts like large language models and prompt engineering
  • Applying AI in real-world tasks.
  • Evaluating the accuracy and trustworthiness of AI-generated outputs
  • Navigating ethical concerns like bias, privacy, and misinformation.

Participants in the study were subjected to a series of 20 carefully constructed multiple-choice questions designed to gauge their practical knowledge and decision-making skills. However, the most compelling aspect of the study was its second phase, where students were tasked with a real-world assignment using a GPT-4o-powered chatbot to analyze visual data from a simulated healthcare environment. Their performance was then compared to their GLAT scores and their self-assessed ChatGPT literacy.

The results revealed students with high GLAT scores performed significantly better on AI-assisted tasks, even when controlling for domain-specific knowledge like visual analytics. Meanwhile, those who rated themselves highly on self-report surveys but scored poorly on GLAT struggled to extract useful information or identify errors in chatbot responses.

In other words, confidence did not correlate with AI literacy or competence.

For institutions, this discrepancy matters. As generative AI becomes more deeply embedded in education, business, and government systems, the risks of misuse, misunderstanding, or blind trust in machine outputs also multiply. 

From biased hiring algorithms to AI-generated disinformation campaigns, the margin for error is shrinking—especially if users can’t distinguish between fact and fiction or hallucination and truth.

The Pentagon has already signaled its intent to incorporate large language models into battlefield decision-making. Agencies from the Department of Homeland Security to the CIA have publicly acknowledged trials using generative AI to streamline intelligence gathering and threat assessments. If the operators feeding prompts into these systems lack the literacy to critically evaluate AI output, the consequences could be catastrophic.

The researchers behind GLAT emphasize its potential not only for education but also for policy development and strategic workforce training. The test is based on rigorous psychometric foundations, including Classical Test Theory and Item Response Theory, both gold standards in educational measurement. It was tested across three groups totaling 355 higher education students, using a blend of scenario-based tasks and statistical modeling to validate both internal consistency and external predictive power.

GLAT’s performance was strongest in identifying those with lower-than-average proficiency—arguably the group most vulnerable to AI misuse. The test was especially precise for students whose literacy levels fell just below the average, which is critical for early intervention and curriculum design. 

Researchers also used confirmatory factor analysis and model comparison techniques to ensure the test’s reliability and unidimensionality, meaning it actually measures practical GenAI literacy.

But the most damning comparison came when GLAT was pitted against the only other GenAI-specific literacy instrument available: the ChatGPT Literacy Scale developed by Lee and Park. Unlike GLAT, that tool is based on self-reports. And while it’s easier to deploy, it failed to predict students’ actual performance during the AI-assisted visual analysis task. GLAT, by contrast, had a statistically significant correlation with successful task execution.

“Greater proficiency in GenAI literacy is associated with enhanced performance in tasks supported by the GenAI chatbot, the researchers found. “Self-reported proficiency with ChatGPT was not a significant factor in predicting their performance scores in GenAI-assisted tasks. 

The implications of AI illiteracy are significant. Suppose students and professionals alike continue to confuse familiarity with AI tools with actual competence. In that case, they risk becoming unwitting amplifiers of misinformation or, worse—blind spots in security systems dependent on human-AI collaboration.

There’s also a darker layer to this gap. Bad actors, including state-sponsored disinformation units, are already exploiting the public’s AI illiteracy. Whether through deepfakes, synthetic media, or emotionally persuasive AI-generated narratives, these tools can be weaponized with little resistance if users lack the skills to question, interpret, and verify what they see.

One area where the public’s misunderstanding of generative AI is evident is on the social media platform X (formerly Twitter), where users frequently appeal to Grok—an AI chatbot integrated into the platform—for fact-checking or insight. 

In the comments beneath virtually every viral post, users can be seen tagging Grok as though it were an all-knowing authority capable of instant verification or expert analysis. However, Grok is not some oracle of truth or an omniscient digital mind. Instead, it is a generative language model trained to predict plausible-sounding text based on patterns in X user posts.

Like other chatbots, Grok can fabricate facts, reflect biases, and misunderstand nuance, especially when prompted with emotionally charged or ambiguous content. Just recently, after a significant upgrade, the chatbot began spewing antisemitic hate speech. 

The fact that many social media users treat Grok as a definitive source underscores how little many people understand about how these systems actually function and how easily that blind trust can be exploited.

In that sense, GLAT might represent something more than an educational tool—it could be a prototype for digital defense training. A kind of AI war game where those on the front lines of media, policy, and strategy learn not just how to prompt a chatbot but how to detect when it’s wrong, biased, or being used to deceive.

Just days after Grok was caught churning out hate-filled messages, the U.S. Department of Defense announced it had awarded a $200 million contract to Elon Musk’s xAI to develop and implement artificial intelligence tools for the Pentagon. 

The timing for the multi-million dollar contract is both alarming and deeply revealing. As governments move to integrate AI into military operations, intelligence analysis, and critical infrastructure, the stakes for misunderstanding these systems couldn’t be higher. The incident with Grok underscores the urgency of developing reliable AI literacy assessments—not just for students and educators but for policymakers, defense officials, and the public at large. 

If GLAT—or a future iteration of it—were scaled beyond the classroom, it could lay the foundation for certifying AI operators or even preparing the broader public for life in an AI-driven world. While the study stops short of making such recommendations, it leaves the door clearly open.

“This study advocates for the integration of performance-based assessments in addition to traditional self-reported measures to evaluate GenAI literacy reliably, researchers write. “The findings highlight the need for continuous adaptation of assessment tools to keep pace with technological advancements, thereby equipping educators and students with the skills necessary to engage in an AI-driven future effectively.”

Tim McMillan is a retired law enforcement executive, investigative reporter and co-founder of The Debrief. His writing typically focuses on defense, national security, the Intelligence Community and topics related to psychology. You can follow Tim on Twitter: @LtTimMcMillan.  Tim can be reached by email: tim@thedebrief.org or through encrypted email: LtTimMcMillan@protonmail.com