Inside the AI Race: Google’s Gemini vs. Anthropic’s Claude
In the high-stakes world of artificial intelligence, companies are constantly striving to outperform their competitors. Recently, TechCrunch uncovered intriguing internal correspondence that sheds light on how contractors working to enhance Google’s Gemini AI are comparing its responses to those from Anthropic’s rival model, Claude.
The question of whether Google obtained permission to use Claude in this way remains unanswered. When TechCrunch reached out for clarification, Google declined to comment on whether they had sought approval from Anthropic, which is noteworthy given Google’s significant investment in the company.
- Performance evaluations typically involve industry benchmarks.
- Contractors assess AI models based on truthfulness and verbosity.
- They have 30 minutes per prompt to decide which model’s response is superior.
Interestingly, contractors started observing references to Anthropic’s Claude within the internal Google platform they utilize for these comparisons. One explicit output even stated, “I am Claude, created by Anthropic.”
“Claude’s safety settings are the strictest,” noted a contractor, highlighting how Claude often prioritizes safety over response completeness, refusing to engage in prompts considered unsafe.
{TechCrunch Correspondence}
This cautious approach starkly contrasts with instances where Gemini’s responses were flagged for significant safety violations. Such findings raise questions about how AI models prioritize safety and accuracy in their responses.
Anthropic’s terms of service explicitly prohibit using Claude for training competing AI models without their consent. Despite Google’s stake in Anthropic, it remains unclear if formal permission was granted for these evaluations. Shira McNamara from Google DeepMind confirmed that while they compare model outputs for evaluation purposes, they do not train Gemini on Anthropic models.
“Any suggestion that we have used Anthropic models to train Gemini is inaccurate,” McNamara stated.
{Shira McNamara, Google DeepMind}
Furthermore, TechCrunch reported that contractors are now tasked with rating Gemini’s responses in areas outside their expertise, sparking concerns over potential inaccuracies in sensitive fields such as healthcare.
For more insights like this, you can subscribe to TechCrunch’s AI-focused newsletter and stay updated every Wednesday!