Loading...
Loading...
Browse all stories on DeepNewz
VisitWhat will be the rank of NVLM-1.0-D-72B in vision-language tasks by end of 2024?
Top 1 • 25%
Top 2 • 25%
Top 3 • 25%
Below Top 3 • 25%
Benchmark results published by credible sources such as Hugging Face or independent AI evaluation platforms
Nvidia Releases NVLM-1.0-D-72B Multimodal LLM with Decoder-Only Architecture Achieving SOTA Results on Vision-Language Tasks
Oct 1, 2024, 05:58 AM
Nvidia has released NVLM-1.0-D-72B, a frontier-class multimodal large language model (LLM) with a decoder-only architecture. The model achieves state-of-the-art (SOTA) results on both vision-language and text-only tasks. It is reported to rival other advanced models such as GPT-4o, Llama 3-V 405B, and InternVL 2 in various evaluations, including math and coding. Nvidia has also made the checkpoint and inference scripts available on Hugging Face, with training code and additional versions like NVLM-1.0-X and NVLM-1.0-H expected to follow.
View original story
Top 5 • 25%
Top 10 • 25%
Top 20 • 25%
Outside Top 20 • 25%
Yes • 50%
No • 50%
Top 1 • 25%
Top 2-5 • 25%
Top 6-10 • 25%
Outside Top 10 • 25%
GPT-4o • 25%
InternVL 2 • 25%
NVLM 1.0 • 25%
Other • 25%
Top 1 • 25%
Top 3 • 25%
Top 5 • 25%
Below Top 5 • 25%
Top 1 • 25%
Top 5 • 25%
Top 10 • 25%
Below Top 10 • 25%
Rank 1 • 25%
Rank 2 • 25%
Rank 3 • 25%
Rank 4 or lower • 25%
1st place • 25%
2nd place • 25%
3rd place • 25%
4th place or lower • 25%
Top-1 • 25%
Top-3 • 25%
Top-5 • 25%
Not in Top-5 • 25%
Top 1 • 25%
Top 2 to 3 • 25%
Top 4 to 5 • 25%
Below top 5 • 25%
1st place • 25%
2nd place • 25%
3rd place • 25%
4th place or lower • 25%
Top 1 • 25%
Top 5 • 25%
Top 10 • 25%
Below Top 10 • 25%
Yes • 50%
No • 50%
No • 50%
Yes • 50%
20-30% • 25%
Below 20% • 25%
Above 40% • 25%
30-40% • 25%