X-rays

How AI Used Knee X-Rays To Accurately Predict Beer Drinking and Bean Eating Habits and Why it’s Concerning

What if knee X-rays could reveal your dietary preferences? A recent study demonstrates that artificial intelligence (AI) models can seemingly deduce whether someone drinks beer or avoids refried beans by analyzing x-ray images of their knees. 

While it sounds like a futuristic marvel, recently published in Nature: Scientific Reports, the findings raise serious concerns about the pitfalls of “shortcut learning” in medical AI.

“While AI has the potential to transform medical imaging, we must be cautious,” lead study author and orthopedic surgeon at Dartmouth Hitchcock Medical Center (DHMC), Dr. Peter L. Schilling, said in a statement. “These models can see patterns humans cannot, but not all patterns they identify are meaningful or reliable. It’s crucial to recognize these risks to prevent misleading conclusions and ensure scientific integrity.”

In the ever-expanding field of artificial intelligence, machine learning algorithms are tackling increasingly complex problems. 

AI is revolutionizing healthcare, from diagnosing rare diseases to personalizing treatments. However, a groundbreaking study led by researchers at Dartmouth-Hitchcock Medical Center and the Geisel School of Medicine at Dartmouth reveals an unsettling phenomenon: AI models can detect seemingly unrelated factors, like dietary habits, from medical images. 

This capability, though intriguing, could be a sign of algorithmic “shortcutting,” a tendency for AI to rely on irrelevant but detectable patterns in data.

Using the extensive Osteoarthritis Initiative (OAI) dataset, which includes over 25,000 knee X-rays, the research team trained convolutional neural networks (CNNs) to predict two implausible outcomes—whether a person drinks beer or avoids refried beans. 

Shockingly, the models achieved moderate accuracy, with an area under the curve (AUC) of 0.73 for beer consumption and 0.63 for refried bean avoidance.

These results do not indicate a hidden truth about dietary preferences encoded in knee anatomy. Instead, they expose how AI models exploit confounding variables—hidden correlations in data that have little to do with the intended prediction task. 

Shortcut learning occurs when AI models identify patterns that provide quick answers rather than meaningful insights. In this study, the shortcuts included subtle differences linked to clinical sites, X-ray machine manufacturers, and imaging protocols. 

For example, saliency maps used to visualize model decision-making showed that predictions relied on image artifacts like laterality markers and blacked-out sections for patient health indicators.

The implications are profound. While CNNs can uncover non-obvious information in medical images, they may also learn misleading correlations, jeopardizing the validity of clinical findings. 

This phenomenon is not limited to dietary predictions. Previous studies have shown that AI can deduce patient race, age, and gender from chest X-rays and other medical images—often with startling accuracy. 

These capabilities highlight AI’s dual-edged nature: its ability to detect patterns invisible to humans and its susceptibility to misinterpretation.

For instance, the study found that the CNNs trained to predict dietary preferences also retained knowledge useful for identifying patient demographics. When re-tasked, the models could predict gender, race, and clinical site with high accuracy, underscoring how intertwined latent variables can skew predictions.

“This goes beyond bias from clues of race or gender,” Brandon G. Hill, a machine learning scientist at DHMC and study co-author, explained. “We found the algorithm could even learn to predict the year an X-ray was taken.” 

“It’s pernicious; when you prevent it from learning one of these elements, it will instead learn another it previously ignored. This danger can lead to some really dodgy claims, and researchers need to be aware of how readily this happens when using this technique.”

The findings underscore the need for caution when interpreting AI outputs in medicine. In the rush to harness AI’s potential, researchers and clinicians must ensure that models are not merely capturing superficial patterns. 

Shortcut learning can lead to erroneous conclusions, undermining the trust in AI-driven diagnostics and treatments. Moreover, the study challenges the notion that preprocessing or normalizing data is sufficient to mitigate biases. 

Despite efforts to standardize the images, the models still exploited latent variables to make predictions. This persistence of bias highlights the difficulty of addressing shortcut learning comprehensively.

As AI becomes more integrated into healthcare, understanding its limitations is critical. Models trained on medical images should undergo rigorous evaluation to ensure they learn meaningful patterns, not shortcuts. 

Techniques like saliency mapping should be used to understand model behavior and identify potential sources of bias. Accuracy metrics alone are insufficient. Researchers must explore whether a model’s predictions align with known medical principles. 

Additionally, regulatory bodies may need to establish guidelines for evaluating AI models in healthcare, focusing on mitigating risks associated with shortcut learning.

The study’s authors advocate for greater interdisciplinary collaboration to address these challenges. By combining the expertise of data scientists, clinicians, and ethicists, the medical community can develop robust AI systems that deliver on their promise without compromising reliability. This multidisciplinary approach is essential in addressing the dual-edged nature of AI in medicine. 

While its ability to uncover previously unseen patterns in medical images is transformative, it also poses risks when those patterns reflect irrelevant, misleading, or biased correlations.

Ultimately, the idea of a knee X-ray revealing dietary habits may evoke amusement, but it also serves as a sobering reminder of AI’s limitations. 

As researchers push the boundaries of what AI can achieve, caution must be taken to avoid the dangers of shortcut learning to ensure the integrity and accuracy of AI-driven insights. 

“Part of the problem is our own bias,” Hill explained. “It is incredibly easy to fall into the trap of presuming that the model ‘sees’ the same way we do. In the end, it doesn’t.” 

“It is almost like dealing with an alien intelligence. You want to say the model is ‘cheating, but that anthropomorphizes the technology. It learned a way to solve the task given to it, but not necessarily how a person would. It doesn’t have logic or reasoning as we typically understand it.”

Tim McMillan is a retired law enforcement executive, investigative reporter and co-founder of The Debrief. His writing typically focuses on defense, national security, the Intelligence Community and topics related to psychology. You can follow Tim on Twitter: @LtTimMcMillan.  Tim can be reached by email: tim@thedebrief.org or through encrypted email: LtTimMcMillan@protonmail.com