Loading...
Loading...
Browse all stories on DeepNewz
VisitNvidia Releases NVLM-1.0-D-72B Multimodal LLM with Decoder-Only Architecture Achieving SOTA Results on Vision-Language Tasks
Oct 1, 2024, 05:58 AM
Nvidia has released NVLM-1.0-D-72B, a frontier-class multimodal large language model (LLM) with a decoder-only architecture. The model achieves state-of-the-art (SOTA) results on both vision-language and text-only tasks. It is reported to rival other advanced models such as GPT-4o, Llama 3-V 405B, and InternVL 2 in various evaluations, including math and coding. Nvidia has also made the checkpoint and inference scripts available on Hugging Face, with training code and additional versions like NVLM-1.0-X and NVLM-1.0-H expected to follow.
View original story
Markets
Yes • 50%
No • 50%
Benchmark results published by credible sources such as Hugging Face or independent AI evaluation platforms
No • 50%
Yes • 50%
Evaluation results published by credible sources or Nvidia's official announcements
No • 50%
Yes • 50%
Official announcements from Nvidia or the commercial entity
20-30% • 25%
Below 20% • 25%
Above 40% • 25%
30-40% • 25%
Market analysis reports from credible sources such as Gartner or IDC
Top 3 • 25%
Top 1 • 25%
Below Top 3 • 25%
Top 2 • 25%
Evaluation results published by credible sources or Nvidia's official announcements
Below Top 3 • 25%
Top 1 • 25%
Top 2 • 25%
Top 3 • 25%
Benchmark results published by credible sources such as Hugging Face or independent AI evaluation platforms