Loading...
Loading...
Browse all stories on DeepNewz
VisitWill BitNet b1.58 reach 10 tokens/second on a single CPU by March 2025?
Yes • 50%
No • 50%
Benchmark results published by Microsoft or third-party tech reviewers
Microsoft Open-Sources BitNet Framework for 1-Bit LLMs Including BitNet b1.58
Oct 20, 2024, 01:37 AM
Microsoft has open-sourced bitnet.cpp, the official inference framework for 1-bit large language models (LLMs), such as BitNet b1.58 and BitNet Llama8B. This framework offers a suite of optimized kernels that support fast and lossless inference of 1.58-bit models on CPUs, with NPU and GPU support expected in the future. BitNet technology significantly speeds up the calculation of large-scale language models, which previously required 4 or more bits of information. This development allows 100 billion parameter models to run on local devices, quantized with BitNet b1.58, at a rate of 5-7 tokens per second using a single CPU. Exo had previously open-sourced the first ternary model implementation for Apple Silicon in March, and now BitNet b1.58 can run on Apple M2 CPUs.
View original story
x86 • 25%
ARM • 25%
RISC-V • 25%
Other • 25%
Yes • 50%
No • 50%
Yes • 50%
No • 50%
Yes • 50%
No • 50%
Yes • 50%
No • 50%
Yes • 50%
No • 50%
Less than 1,000 TPS • 25%
Between 1,000 and 1,500 TPS • 25%
Between 1,500 and 2,000 TPS • 25%
More than 2,000 TPS • 25%
Remains stable • 25%
Decreases • 25%
Increases significantly • 25%
Increases slightly • 25%
Research and academia • 25%
Consumer devices • 25%
Other • 25%
Enterprise solutions • 25%