Loading...
Loading...
Browse all stories on DeepNewz
VisitWhich model will have the highest citation F1 score in a major independent benchmark by end of 2024?
ContextCite • 25%
LongCite-8B • 25%
LongCite-9B • 25%
GPT-4o • 25%
Results from major independent benchmarks and research papers
MIT Researchers Introduce ContextCite, LongCite-8B and LongCite-9B for Enhanced Language Model Attribution
Sep 4, 2024, 09:53 PM
Researchers from MIT, including B Cohen-Wang, H Shah, K Georgiev, and A Madry, have introduced a new model called ContextCite, aimed at improving the attribution of language model generations to specific parts of the provided context. The model learns a surrogate that approximates how a language model's response is affected by including or excluding each part of the context. This innovation addresses the challenge of fine-grained in-line citations in long-context scenarios, which current long-context language models struggle with. ContextCite synthesizes a large-scale supervised fine-tuning (SFT) dataset with off-the-shelf language models to enhance citation generation in long-context question answering (QA). The results indicate that ContextCite's models, LongCite-8B and LongCite-9B, outperform GPT-4o by 6.4% and 3.6% in citation F1 score, respectively, and offer 2x finer citation granularity compared to proprietary models. Additionally, there is a 7-9% improvement in response correctness.
View original story
Llama 3.1 405B • 25%
GPT-4o • 25%
Claude Sonnet 3.5 • 25%
Other • 25%
Claude 3.5 Sonnet • 33%
GPT-4o • 33%
Google's AI Model • 33%
Meta (Llama 3) • 25%
OpenAI (GPT-4o) • 25%
Anthropic (Claude 3.5 Sonnet) • 25%
Other • 25%
Meta's Llama 3.1-70B • 25%
OpenAI's GPT-4 • 25%
Google's Bard • 25%
Other • 25%
ChatGPT-4o • 25%
Google's Gemini • 25%
Another AI model • 25%
No clear leader • 25%
Claude 3.5 Sonnet • 33%
GPT-4o • 33%
Gemini • 34%
Google's Gemini • 25%
OpenAI's GPT • 25%
Microsoft's Azure AI • 25%
Other • 25%
Llama 3.1 405B • 25%
GPT-4o • 25%
Claude Sonnet 3.5 • 25%
Other • 25%
DeepSeek-R1-Lite-Preview • 25%
OpenAI's o1-preview • 25%
Google DeepMind's model • 25%
Other • 25%
OpenAI's O1 model • 25%
GPT-4 • 25%
Gemini • 25%
Anthropic's Claude • 25%
Llama 3.1 405B • 25%
GPT-4o • 25%
Claude Sonnet 3.5 • 25%
Other • 25%
GPT-4o • 25%
GPT-4o mini • 25%
Other OpenAI models • 25%
Non-OpenAI models • 25%
No • 50%
Yes • 50%
Yes • 50%
No • 50%
MIT • 25%
Other • 25%
Microsoft • 25%
Google • 25%