AI’s Quirky Benchmarks: From Will Smith Eating Spaghetti to Minecraft Marvels
When a company unveils a new AI video generator, it’s not long before the internet challenges it with a peculiar test: creating a video of actor Will Smith enjoying a bowl of spaghetti. This quirky benchmark has become both a meme and a rite of passage for new AI technology. In February, Smith himself joined the fun with an Instagram post humorously nodding to the trend.
- A 16-year-old developer crafted an app allowing AI to autonomously construct buildings within Minecraft, assessing its creative capabilities.
- A British programmer introduced a platform where AI competes in games like Pictionary and Connect 4, showcasing its strategic prowess.
“The fact that there are not 30 different benchmarks from different organizations in medicine, in law, in advice quality, and so on is a real shame, as people are using systems for these things, regardless,”
{Ethan Mollick, Wharton Professor}
While traditional benchmarks exist to evaluate AI’s performance on academic tasks, such as solving Math Olympiad problems or tackling Ph.D.-level questions, they often leave the average person scratching their head. Most of us engage with AI for more mundane tasks like responding to emails or conducting basic research.
Platforms like Chatbot Arena strive to democratize AI evaluation by letting anyone rate AI’s ability in tasks like app creation or image generation. However, these raters are typically entrenched in tech circles, leading to subjective preferences that may not reflect general user needs.
Ethan Mollick highlights a significant gap in the industry: the lack of diverse benchmarks across fields like medicine and law. Without these comparisons to human performance, it’s challenging to gauge AI’s real-world impact.
Despite their limitations, quirky benchmarks like Will Smith eating spaghetti persist because they’re both amusing and accessible. Watching AI build castles in Minecraft or play Connect 4 isn’t just entertaining; it’s a simple way to engage with an otherwise complex technology. As we look to the future, one can’t help but wonder what odd new challenges will capture our imaginations in 2025.