Loading...
Loading...
Browse all stories on DeepNewz
VisitAnthropic Launches Prompt Caching with 90% Cost and 80% Latency Reductions
Aug 14, 2024, 04:49 PM
Anthropic has introduced a new feature called prompt caching in its API, currently available in beta. This feature significantly reduces the costs and latency associated with AI model responses. By storing and reusing context, prompt caching can cut API input costs by up to 90% and reduce latency by up to 80%. This development is particularly beneficial for applications involving long, static instructions, as it allows for more efficient processing. The prompt caching feature is designed to improve the performance of large language models (LLMs) and is expected to have a substantial impact on applications such as Retrieval-Augmented Generation (RAG). The pricing model for Anthropic's caching involves charges for cache writes, with a cache lifetime of five minutes that refreshes each time the cached content is used. The feature supports Claude 3 Haiku, Opus, and 3.5 Sonnet.
View original story
OpenAI's O1 model • 25%
GPT-4 • 25%
Gemini • 25%
Anthropic's Claude • 25%
North America • 25%
Europe • 25%
Asia-Pacific • 25%
Latin America • 25%
GPT-3.5 • 25%
GPT-4 • 25%
Other LLMs • 25%
Hybrid use of multiple models • 25%
Microsoft Teams • 25%
HubSpot • 25%
Discord • 25%
Slack • 25%
Claude 3.5 • 25%
GPT-4 • 25%
Gemini Pro • 25%
Llama • 25%
Meta's Llama 3.1-70B • 25%
OpenAI's GPT-4 quantized • 25%
Google's Bard quantized • 25%
Other • 25%
Llama 3.1 405B • 25%
Llama 3.1 8B • 25%
Llama 3.1 70B • 25%
Other • 25%
North America • 25%
Europe • 25%
Asia-Pacific • 25%
Other regions • 25%
North America • 25%
Europe • 25%
Asia-Pacific • 25%
Other • 25%
North America • 25%
Europe • 25%
Asia • 25%
Other • 25%
North America • 25%
Europe • 25%
Asia • 25%
Other • 25%
Less than 20% • 25%
20%-40% • 25%
40%-60% • 25%
More than 60% • 25%
No • 50%
Yes • 50%
Less than 70% • 25%
More than 80% • 25%
75% to 80% • 25%
70% to 75% • 25%