DeepSeek

DeepSeek: A Game-Changer in AI

Imagine a world where artificial intelligence (AI) is not just a tool but a revolution. Enter DeepSeek, a Chinese company that has been making waves with its open-source large language models. Founded by Liang Wenfeng, the co-founder of hedge fund High-Flyer, DeepSeek aims to democratize AI and challenge the dominance of American tech giants.

The Birth of DeepSeek

DeepSeek was born in 2023 with a mission to provide high-quality AI at a fraction of the cost. Their flagship model, DeepSeek-R1, has been described as ‘upending AI’ and ushering in a new era of competition. By releasing its first free chatbot app in January 2025, DeepSeek quickly surpassed ChatGPT on the iOS App Store, sparking a price war in China’s AI market.

Open-Source vs. Censorship

While DeepSeek makes its generative AI algorithms open-source, some applications are restricted due to Chinese government censorship policies. This raises questions about the balance between innovation and control. As of May 2024, Liang Wenfeng held an impressive 84% stake in the company through two shell corporations, ensuring his vision remains intact.

DeepSeek’s V2 Model: A Price War Starter

In May 2024, DeepSeek released its V2 model, offering strong performance at a low price. This move started the AI model price war in China and demonstrated DeepSeek’s commitment to affordability without compromising on quality. Despite lower prices, the company became profitable, outperforming rivals that were losing money.

Research Focus: No Commercialization Plans

DeepSeek focuses on research rather than commercialization, allowing it to avoid stringent AI regulations. The company hires technical talent over experience, recruiting recent graduates and developers with less established careers. This approach not only keeps costs down but also brings fresh perspectives to the table.

The Training Framework

DeepSeek built two computing clusters: Fire-Flyer (200 million yuan) and Fire-Flyer 2 (1 billion yuan). These clusters are equipped with co-designed software and hardware architecture, including the 3FS file system, hfreduce library, and hfai.nn software library. The Fire-Flyer 2 had a capacity utilization of over 96% in 2022, totaling 56.74 million GPU hours.

Development and Release History

The DeepSeek Coder series was released on November 2nd, 2023, with models available for both base and instruction-finetuned versions. The DeepSeek LLM model followed suit in late November 2023, offering 7B and 67B parameter models in Base and Chat forms. Both models were released on January 9, 2024, with a vocabulary size of 102,400 (byte-level BPE) and context length of 4096.

DeepSeek-V2: A New Era

In May 2024, DeepSeek released the V2 series, including models with 16B parameters. These models were trained on a dataset of 8.1T tokens and extended context length from 4K to 128K using YaRN. The V2-Lite models were smaller but still trained similarly, focusing on supervised finetuning followed by direct policy optimization.

DeepSeek-V3: Further Advancements

In December 2024, DeepSeek released the V3 series with a base model and chat model. The architecture remained similar to V2 but added multi-token prediction. Pretraining was done on 14.8T tokens of a multilingual corpus mostly in English and Chinese. The extend context length was doubled from 4K to 32K, then to 128K using YaRN.

Performance and Impact

The DeepSeek-R1-Lite-Preview exceeded performance of OpenAI o1 on benchmarks such as AIME and MATH. However, The Wall Street Journal reported that the o1 model reached solutions faster than DeepSeek-R1-Lite-Preview. This highlights the ongoing competition in AI development.

Challenges and Concerns

DeepSeek’s success has not been without challenges. Limited new user registration to mainland China after a cyberattack disrupted its servers, and integrated censorship mechanisms in the R1 model can only be partially removed in its open-source version. These issues raise concerns about data privacy and potential misuse of AI technology.

Conclusion

DeepSeek’s journey from a small startup to a major player in the AI market is nothing short of remarkable. By focusing on research, affordability, and innovation, DeepSeek has challenged the global dominance of American AI models. As we continue to navigate the complex landscape of AI development, companies like DeepSeek remind us that there are multiple paths to progress.

Condensed Infos to DeepSeek