Loading...
Loading...
Browse all stories on DeepNewz
VisitMistralAI Releases New 25.38 GB Pixtral-12B Vision-Language Model
Sep 11, 2024, 07:01 AM
MistralAI has released a new vision-language multimodal model called Pixtral-12B. The model, which is available via a magnet link, has a size of 25.38 GB. Key architectural features of Pixtral-12B include a dimension of 5120, 40 layers, a head dimension of 128, a hidden dimension of 14336, 32 heads, 8 key-value heads, a rope theta of 1000000000.0, a normalization epsilon of 1e-05, and a vocabulary size of 131072. The model also incorporates GeLU and 2D RoPE for the vision adapter and includes three new tokens in its tokenizer. The vision encoder's hidden size is also notable.
View original story
Yes • 50%
No • 50%
Yes • 50%
No • 50%
Yes • 50%
No • 50%
Yes, at CES 2025 • 25%
Yes, at Google I/O 2025 • 25%
Yes, at Microsoft Build 2025 • 25%
No • 25%
Top 1 in a benchmark • 25%
Top 5 in a benchmark • 25%
Top 10 in a benchmark • 25%
No significant milestone • 25%
Yes, in a research paper • 25%
Yes, in a project • 25%
Yes, in both • 25%
No • 25%
Yes • 50%
No • 50%
Healthcare • 25%
Finance • 25%
Retail • 25%
Entertainment • 25%
Yes • 50%
No • 50%
Language Processing • 25%
Vision Processing • 25%
Integration of Language and Vision • 25%
Ease of Use • 25%
Yes • 50%
No • 50%
No • 50%
Yes • 50%
21-30 • 25%
31+ • 25%
0-10 • 25%
11-20 • 25%
Top 6-10 • 25%
Top 1 • 25%
Outside Top 10 • 25%
Top 2-5 • 25%