Loading...
Loading...
Browse all stories on DeepNewz
VisitMistralAI Releases New 25.38 GB Pixtral-12B Vision-Language Model
Sep 11, 2024, 07:01 AM
MistralAI has released a new vision-language multimodal model called Pixtral-12B. The model, which is available via a magnet link, has a size of 25.38 GB. Key architectural features of Pixtral-12B include a dimension of 5120, 40 layers, a head dimension of 128, a hidden dimension of 14336, 32 heads, 8 key-value heads, a rope theta of 1000000000.0, a normalization epsilon of 1e-05, and a vocabulary size of 131072. The model also incorporates GeLU and 2D RoPE for the vision adapter and includes three new tokens in its tokenizer. The vision encoder's hidden size is also notable.
View original story
1st place • 25%
2nd place • 25%
3rd place • 25%
4th place or lower • 25%
1st place • 25%
2nd place • 25%
3rd place • 25%
4th place or lower • 25%
Yes • 50%
No • 50%
Top 1 in a benchmark • 25%
Top 5 in a benchmark • 25%
Top 10 in a benchmark • 25%
No significant milestone • 25%
Top 1 • 25%
Top 2 • 25%
Top 3 • 25%
Below Top 3 • 25%
GPT-4o • 25%
InternVL 2 • 25%
NVLM 1.0 • 25%
Other • 25%
Yes • 50%
No • 50%
Top 5 • 25%
Top 10 • 25%
Top 20 • 25%
Outside Top 20 • 25%
Rank 1 • 25%
Rank 2 • 25%
Rank 3 • 25%
Rank 4 or lower • 25%
Top 1 • 25%
Top 5 • 25%
Top 10 • 25%
Below Top 10 • 25%
Top 1 • 25%
Top 2 to 3 • 25%
Top 4 to 5 • 25%
Below top 5 • 25%
Yes • 50%
No • 50%
No • 50%
Yes • 50%
No • 50%
Yes • 50%
21-30 • 25%
31+ • 25%
0-10 • 25%
11-20 • 25%