Loading...
Loading...
Browse all stories on DeepNewz
VisitGroq Inc. Achieves 40,792 Tokens/s on Llama3 70B Model in AI Breakthrough
Jun 6, 2024, 03:57 PM
Groq Inc. has made significant advancements in AI language modeling, particularly with the Llama3 models. The company has achieved an input rate of 40,792 tokens per second on the Llama3 70B model, utilizing FP16 multiply and FP32 accumulate operations. This follows their previous milestone of 30,000 tokens per second on the Llama3 8B model. These improvements are attributed to Groq's innovative approach, which includes the elimination of MatMul operations in favor of addition and negation operations. This method has not only maintained strong performance at billion-parameter scales but also reduced memory usage by up to 61%. Groq's technology demonstrates impressive inference speed and precision, processing approximately 8,000 tokens in 0.2 seconds with lossless precision. Additionally, Groq has achieved over 1200 tps on L3 8B, operating at 13W, moving LLMs closer to brain-like efficiency.
View original story
AWS • 20%
Google Cloud • 20%
Microsoft Azure • 20%
Snowflake • 20%
IBM WatsonX • 20%
Above 20% • 25%
5%-10% • 25%
10%-15% • 25%
15%-20% • 25%