Sanctions Played a Crucial Role in DeepSeek. The Company Had to Be Resourceful to Break Out of AI’s ‘Bigger Is Better’ Approach

  • According to DeepSeek, the infrastructure for training its AI model includes 2,048 Nvidia chips.

  • The training process, which involved 671 billion parameters, has cost $5.6 million.

Chip
No comments Twitter Flipboard E-mail
juan-carlos-lopez

Juan Carlos López

Senior Writer

An engineer by training. A science and tech journalist by passion, vocation, and conviction. I've been writing professionally for over two decades, and I suspect I still have a long way to go. At Xataka, I write about many topics, but I mainly enjoy covering nuclear fusion, quantum physics, quantum computers, microprocessors and TVs. LinkedIn

Microsoft CEO Satya Nadella recently commented on DeepSeek’s new AI model. “To see the DeepSeek new model, it’s super impressive in terms of both how they have really effectively done an open source model that does this inference-time compute, and is super-compute efficient. We should take the developments out of China very, very seriously… As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can’t get enough of,” he shared in a LinkedIn post on Monday.

In his statement, Nadella acknowledges the Chinese company’s significant technological achievement. It’s commendable that he does so unambiguously, especially considering that Microsoft is a competitor in the AI industry. Additionally, the American company recently experienced a sharp decline in stock market value following DeepSeek R1’s breakthrough. One thing is clear: DeepSeek’s AI model is, to a large extent, a response to the pressure exerted by U.S. sanctions on Chinese companies.

Nvidia CEO Jensen Huang talked about this possibility at Computex in late May 2023. He said, “The amount of resources that has been dedicated to this area in China… is quite massive, so you can’t underestimate them.” He was warning the U.S. government to highlight the potential consequences of sanctions that aim to inhibit China’s technological progress. While Huang was specifically referring to Chinese GPU designers, his statement can also apply to Chinese companies developing AI models. In the end, GPUs and large language models are closely intertwined in this field.

The U.S. Will Continue to Lead in AI Technology

A significant portion of the sanctions imposed by the Biden administration as of Oct. 7, 2022, aims to hinder the development of China’s semiconductor industry and its AI technology. Integrated circuits and AI are closely linked, which is why these bans prevent companies such as Nvidia, AMD, and Intel from selling their most advanced GPUs to Chinese customers. This restriction is likely the basis for DeepSeek’s major achievements.

DeepSeek says the infrastructure for training its AI model includes 2,048 Nvidia chips.

According to DeepSeek, the infrastructure used for training the DeepSeek R1 model incorporates 2,048 Nvidia chips and has cost around $5.6 million. This aligns with Nadella’s recent statements. However, some analysts argue that, in reality, DeepSeek’s infrastructure may include as many as 50,000 GPUs obtained through intermediaries, but this remains speculative at this stage.

DeepSeek told the Financial Times that its choice to base its training infrastructure on Nvidia GPUs appears to have been influenced by U.S. sanctions that restricted access to more powerful chips. As of Nov. 16, 2023, bans prevent Nvidia from supplying certain GPUs to Chinese customers, but it’s presumed that DeepSeek had already established its infrastructure by that time. What stands out is that the Chinese company has achieved remarkable results with relatively modest AI chip resources.

DeepSeek’s undeniable success represents a significant achievement for China, but it’s only a partial victory. For now, the U.S. is ahead in the AI race. Its advantage stems from an undeniable fact. The U.S. controls the majority of GPU manufacturers and many companies involved in developing AI models. These companies also have unrestricted access to the most advanced GPUs produced by Nvidia and others.

For its part, China has Huawi GPUs, which seem to be quite competitive in inference processing. The Asian nation also has access to GPUs from companies including Moore Threads, MetaX, Biren Technology, Innosilicon, Zhaoxin, Iluvatar CoreX, DenglinAI, Vast AI Tech, and others. However, at present, China is at a clear disadvantage. Despite this, the ongoing confrontation between the two countries is likely to be a long one, making any conclusions about which country will ultimately prevail in the AI arena premature.

Image | Vishnu Mohanan

Related | China is Keeping a Close Eye on the Stargate Project. Its Response Is Already Underway

Home o Index