In the AI industry, there are two major categories: hardware and software. Nvidia dominates the hardware segment with its data center GPUs, such as the well-known H100 and the new B200, and holds a leading position in this market.
The software market is fiercely competitive, with large language models (LLMs) taking center stage. Notable companies in this space include OpenAI with GPT-4o, Anthropic with Claude 3.5, Google with Gemini, and Meta with Llama, among others.
A new competitor has emerged for all of them: Nvidia. The company doesn’t seem to be satisfied with dominating the hardware segment and now wants to compete with its own LLM, which it has named NVLM 1.0. The team behind it recently published a study detailing its development.
NVLM 1.0 is actually a series of multimodal LLMs that, according to the company, offer particularly outstanding results in the fields of vision and language, rivaling other models such as GPT-4o.
NVLM 1.0 features a 72 billion parameter model (NVLM-D-72B), which is currently the most capable and ambitious within the series. According to Nvidia, it outperforms Llama 3 405B, a much larger model, in various performance tests.
Additionally, NVLM 1.0 is an open-weight model, and developers have promised to publish the code used to train the model. This will be especially useful for developers looking to use it for their own projects and forks.
The NVLM-D-72B (I could use a simpler name, thanks) shows promise in analyzing visual and text input. Specifically, it can interpret memes, analyze images, and solve mathematical problems step by step.
According to Nvidia, the model can do all this because it utilizes versatile multimodal capabilities, including “OCR, reasoning, localization, common sense, world knowledge, and coding ability.”
The latest arrival in the AI software segment is particularly interesting due to its origins. While the future development of Nvidia’s model is yet to be seen, the decision to offer it openly makes it a direct competitor to Llama and an intriguing alternative for developers.
Image | BoliviaInteligent
View 0 comments