TRENDING

The Secret to DeepSeek’s Extreme Efficiency Is Out: It Bypassed Nvidia’s CUDA Standard

DeepSeek engineers used PTX to maximize the H800 GPUs’ performance.
One strategy was to use only 20 SMs of each GPU for inter-server communication.

No comments Twitter E-mail

January 29, 2025 Updated January 30, 2025, 09:33 ET

Juan Carlos López

Senior Writer

Releasing the DeepSeek AI V3 model as open source is a blessing. The strategy DeepSeek engineers devised to develop such an efficient AI model is gradually coming to light. Before continuing, it’s essential to remember that DeepSeek claims to have trained its model using only 2,048 Nvidia H800 chips.

Some analysts say its infrastructure consists of 50,000 H100 GPUs purchased through intermediaries, though this remains conjecture. The H100 is more powerful than the H800, but it’s entirely plausible that DeepSeek had to settle for the second due to U.S. government sanctions preventing Chinese companies from accessing the H100. As of November 2023, Nvidia is also barred from shipping its H800 chip to Chinese customers.

One of the Keys to DeepSeek’s Success: PTX

The GPUs of Nvidia aren’t the only factor behind its rapid growth over the past five years. The company’s compute unified device architecture (CUDA) has played a crucial role. Most AI projects today rely on CUDA, which unifies the compilers and development tools programmers use to write software for Nvidia GPUs. Replacing it in ongoing projects presents challenges.

OpenAI Has Freely Used Anything Online to Train Its AI Models. It’s Now Accusing DeepSeek of Stealing Its Data

More from Xataka On

OpenAI Has Freely Used Anything Online to Train Its AI Models. It’s Now Accusing DeepSeek of Stealing Its Data

Huawei, seeking a significant share of China’s AI market, has developed its own computing architecture for neural networks as an alternative to CUDA. For now, though, CUDA dominates. Nvidia’s tool provides a high-level language that gives programmers affordable access to GPU hardware. However, DeepSeek engineers bypassed CUDA and instead used parallel thread execution (PTX).

DeepSeek engineers used PTX to maximize the performance of the H800 GPUs’ in their possession.

PTX, similar to assembler, is the low-level language Nvidia suggests for developers who need to implement optimizations directly on its GPUs. Programming with PTX is more complex and time-consuming than using CUDA, but it allows developers to write more efficient code that better utilizes GPU resources.

Presumably, DeepSeek engineers used PTX to maximize the H800 GPUs’ performance. One of their stratagems was using only 20 streaming multiprocessors (SMs) per GPU for server-to-server communication, leaving the remaining 112 SMs on each chip for computation. Essentially, Chinese engineers built DeepSeek from the ground up with such optimizations, largely explaining the AI model’s efficiency.

DeepSeek’s programmers have achieved an engineering feat likely to influence how AI model developers approach their projects. It’s tangible proof that China has successfully adapted to the GPU shortage caused by U.S. sanctions.

Image | Nvidia

Related | Downloading and Installing DeepSeek on Your Computer: How to Use It Locally on Windows, macOS, and Linux

Topics

Log in to leave a comment

Popular Topics

Log in or sign up to comment on stories, and upvote comments.

Email: Password: Password must contain at least six characters. Repeat password:

Username: www.xatakaon.com#user/ Checking... Your username will become part of the address of your user page. Choose carefully because you won't be able to change it. Usernames must contain a minimum of 3 characters. Numbers can be used, though they can't be the initial character. No capital letters, spaces, accent marks, or special characters.

I have read and accept the privacy and participation policy .

Already have an account? Log in here

We will send you an email with a link to recover your password:

Correo electrónico asociado a tu cuenta de Twitter:

Nombre de usuario de Xataka On:

Si no lo recuerdas, puedes recuperar la contraseña con tu nombre de usuario de Xataka On.

If you do not remember it, you canrecover the password with the email associated with your Twitter account.

×

We use third-party cookies to generate audience statistics and display personalized advertising by analyzing your browsing habits. If you continue browsing, you will be accepting their use. More information