Loading...
Loading...
Browse all stories on DeepNewz
VisitWhich AI model will have the highest usage on SambaNova Cloud by mid-2024?
Llama 3.1 8B • 25%
Llama 3.1 70B • 25%
Llama 3.1 405B • 25%
Other • 25%
SambaNova usage statistics
SambaNova Launches Fastest AI Platform with Record 132 Tokens/Sec for Llama 3.1 405B
Sep 10, 2024, 02:48 PM
SambaNova has launched its new cloud inference platform, SambaNova Cloud, which is now available for developers to access Llama 3.1 models including 8B, 70B, and 405B on their custom AI chips. The platform sets a new record for inference speed, achieving 132 tokens per second for Llama 3.1 405B at full precision and 570 tokens per second for Llama 3.1 70B. This performance is 10 times faster than traditional GPUs. The API is available for free with no waitlist, enabling developers to unlock advanced AI applications. Additionally, Llama 3.1 405B can also achieve 100 tokens per second on TogetherCompute API, with a 128k long-context version coming soon.
View original story
Healthcare • 25%
Finance • 25%
Technology • 25%
Other • 25%
Azure OpenAI • 25%
OpenAI • 25%
Meta • 25%
Other • 25%
Claude 3.5 Sonnet • 25%
GPT-4o • 25%
Gemini Pro • 25%
Llama-3 • 25%
OpenAI's O1 model • 25%
GPT-4 • 25%
Gemini • 25%
Anthropic's Claude • 25%
Meta (Llama 3) • 25%
OpenAI (GPT-4o) • 25%
Anthropic (Claude 3.5 Sonnet) • 25%
Other • 25%
Claude 3.5 • 25%
GPT-4 • 25%
Gemini Pro • 25%
Llama • 25%
Claude 3.5 Sonnet • 33%
GPT-4o • 33%
Gemini • 34%
Claude 3.5 Sonnet • 33%
GPT-4o • 33%
Google's AI Model • 33%
Custom generative AI models • 25%
Synthetic data generation • 25%
Fine-tuning • 25%
NeMo Retriever microservices • 25%
Meta's Llama 3.1-70B • 25%
OpenAI's GPT-4 quantized • 25%
Google's Bard quantized • 25%
Other • 25%
Claude 3.5 Sonnet • 33%
GPT-4o • 33%
Google's AI Model • 33%
Llama 3.1 • 25%
GPT-4o • 25%
Bard • 25%
Other • 25%
Yes • 50%
No • 50%
No • 50%
Yes • 50%
Less than 10,000 • 25%
More than 100,000 • 25%
50,001 to 100,000 • 25%
10,000 to 50,000 • 25%