In Question: How Well Do We Identify Deepfakes Created by Artificial Intelligence?
In a groundbreaking study, researchers led by Kellogg assistant professor Matthew Groh have explored the ability of humans to distinguish real photos from AI-generated ones. The team's findings reveal that while humans can sometimes discern the difference, accuracy varies widely depending on the context and image characteristics.
The study involved a dataset of 149 real photographs and 450 AI-generated images created using tools like Midjourney, Firefly, and Stable Diffusion. More than 50,000 participants took part in the experiment, contributing nearly 750,000 observations. Participants could view and categorize as many images as they liked, with accuracy varying depending on factors such as image complexity, distortions, and viewing time.
Under ideal conditions, participants were able to identify real photos over AI-generated ones in about five out of every six images. However, accuracy was significantly lower when participants were asked to categorize AI images with functional implausibilities. Participants were the least accurate at categorizing AI images with such issues.
Longer viewing times increased accuracy significantly, with accuracy increasing from 72 percent at one second to 80 percent at ten seconds. Accuracy also increased with more-complex images, such as photos of groups.
The team created a taxonomy of common issues associated with AI-generated images, including functional implausibilities and stylistic artifacts. Certain artifacts or inconsistencies often give away synthetic images, such as unnatural textures or fine details, anomalies in eyes, teeth, or hair, inconsistent or distorted object parts, compression-resilient generative signals, and semantic inconsistencies.
While humans struggle especially with small or low-detail images, advanced AI detection tools have recently improved substantially, achieving detection accuracies upward of 97–99% under controlled conditions. Tools like Originality.AI and Sapling AI reach up to 99% accuracy on various datasets, including “in-the-wild” images. These tools are especially critical in sensitive contexts such as social media misinformation, journalism, and higher education integrity checks.
The experiment found that people are limited in their ability to detect AI-generated images, particularly as AI-generated images improve. Detection is easier on higher-resolution and less compressed images, as low resolution or lossy compression obscures telltale artifacts that detectors and people rely on.
New frameworks like CO-SPY and VERITAS combine semantic analysis with artifact detection to improve reliability and also provide human-readable explanations for why an image is flagged as AI-generated. These advances address past limitations, such as artifact detectors failing under image compression and semantic detectors overfitting to known styles.
The combined use of AI detectors and human review remains the most effective strategy for authenticity verification in 2025. The study's findings underscore the importance of developing a critical eye when browsing social media or consuming digital content.
Artificial intelligence (AI) was utilized in the generation of images in the study, alongside tools such as Midjourney, Firefly, and Stable Diffusion. Despite human participants' ability to distinguish real photos from AI-generated ones under ideal conditions, their accuracy varied significantly depending on factors like image complexity, viewing time, and the presence of functional implausibilities or stylistic artifacts in the AI-generated images.