Unisami AI News

DeepSeek’s new AI model appears to be one of the best ‘open’ challengers yet

December 26, 2024 | by AI

pexels-photo-14983798

DeepSeek V3: A New Milestone in Open AI Development

In an exciting development from China, DeepSeek, an AI firm, has launched DeepSeek V3, one of the most advanced open AI models on the market. Released under a permissive license, developers are free to modify and use it for a wide range of applications, including commercial purposes. This model excels at various text-based tasks like coding, translation, and composition of essays and emails from prompts.

  • Outperforms both open and closed AI models according to internal benchmarks.
  • Impressive performance in Codeforces programming competitions against major models like Meta’s Llama 3.1 and OpenAI’s GPT-4.
  • Dominates Aider Polyglot test for coding integration capabilities.

“DeepSeek V3 was trained on a massive dataset of 14.8 trillion tokens, making it a giant in the field with 685 billion parameters.”

{DeepSeek Data Sheet}

Parameters are crucial as they often determine a model’s performance—the larger the parameter count, typically the better the model performs. However, this also means that larger models require more powerful hardware to function effectively. DeepSeek V3 is no exception; it demands a setup of high-end GPUs for optimal operation.

Despite its size, DeepSeek V3 was efficiently trained using Nvidia H800 GPUs over two months—a significant feat given recent US restrictions on Chinese acquisitions of such technology. Notably, DeepSeek managed all this with just $5.5 million, a fraction of what other big names spend on similar developments. However, there’s a catch: the model’s responses to sensitive political topics are filtered to align with China’s regulatory standards.

DeepSeek is backed by High-Flyer Capital Management, a Chinese hedge fund using AI for trading strategies. Their competitive advantage has prompted other tech giants like ByteDance and Alibaba to reduce costs or even make their models free. High-Flyer invests heavily in infrastructure like server clusters armed with thousands of Nvidia GPUs to support such breakthroughs.

“Open sourcing is a cultural act,” says Liang Wenfeng, founder of High-Flyer Capital Management. “Closed-source approaches won’t hold back progress forever.”

{Interview with TechCrunch}

The unveiling of DeepSeek-R1 highlights DeepSeek’s ambition to push further into AI reasoning. As the race for AI supremacy heats up, DeepSeek’s efforts are compelling competitors to rethink their strategies and pricing models.

If you’re eager to stay updated on AI trends, consider subscribing to TechCrunch’s dedicated AI newsletter for weekly insights straight to your inbox.

Image Credit: Dave Tombi on Pexels

RELATED POSTS

View all

view all