DeepNewz Markets

Market

Will ContextCite outperform GPT-4o by more than 10% in citation F1 score in an independent benchmark by end of 2024?

MIT•ContextCite•LongCite

Resolution / Starting Odds

Yes • 50%

No • 50%

Independent benchmarks and research papers comparing language models

Story

MIT Researchers Introduce ContextCite, LongCite-8B and LongCite-9B for Enhanced Language Model Attribution

Sep 4, 2024, 09:53 PM

Researchers from MIT, including B Cohen-Wang, H Shah, K Georgiev, and A Madry, have introduced a new model called ContextCite, aimed at improving the attribution of language model generations to specific parts of the provided context. The model learns a surrogate that approximates how a language model's response is affected by including or excluding each part of the context. This innovation addresses the challenge of fine-grained in-line citations in long-context scenarios, which current long-context language models struggle with. ContextCite synthesizes a large-scale supervised fine-tuning (SFT) dataset with off-the-shelf language models to enhance citation generation in long-context question answering (QA). The results indicate that ContextCite's models, LongCite-8B and LongCite-9B, outperform GPT-4o by 6.4% and 3.6% in citation F1 score, respectively, and offer 2x finer citation granularity compared to proprietary models. Additionally, there is a 7-9% improvement in response correctness.

View original story

Similar markets

$Will OpenAI's 'o1' model surpass GPT-4o in a public benchmark by end of 2024?$

Molmo has 10-20% more citations than GPT-4V • 25%

Molmo has more than 20% more citations than GPT-4V • 25%

What accuracy will GPT-4o achieve in a new study by end of 2025?

Below 80% • 25%

80% to 85% • 25%

85% to 90% • 25%

Above 90% • 25%

Worse than GPT-4o • 25%

Inconclusive • 25%

Market

Story

Similar markets

Will OpenAI's 'o1' model surpass GPT-4o in a public benchmark by end of 2024?

Will OpenAI's O1 model outperform GPT-4 in a standardized benchmark test by December 31, 2024?

Will NEO surpass GPT-4 in a major ML benchmark test by June 2025?

How will the number of academic citations for Molmo compare to GPT-4V by June 30, 2025?

What accuracy will GPT-4o achieve in a new study by end of 2025?

Will Nous Research's 15B model outperform GPT-4 in benchmarks by end of 2024?

Will Anthropic's Claude surpass GPT-4 in a major benchmark by June 30, 2025?

CriticGPT outperform human trainers by >70% in code accuracy by end of 2024?

How will Moshi's performance compare to GPT-4o in benchmark tests by December 31, 2024?

Will Meta LLM Compiler surpass GPT-4 in a public benchmark by June 30, 2024?

Will ModernBERT surpass BERT in citation count by end of 2025?

Will Gemini 2.0 surpass GPT-4 in benchmark tests by mid-2025?

Will ContextCite be cited in a peer-reviewed journal by end of 2024?

Will ContextCite be integrated into a major commercial language model platform by end of 2024?

Which company will be the first to publish a paper citing ContextCite in a peer-reviewed journal by end of 2024?

Which major tech company will be the first to announce integration of ContextCite into their language model platform by end of 2024?

Will ContextCite be cited in a peer-reviewed journal by end of 2024?

Will ContextCite be integrated into a major commercial language model platform by end of 2024?

Which company will be the first to publish a paper citing ContextCite in a peer-reviewed journal by end of 2024?

Which major tech company will be the first to announce integration of ContextCite into their language model platform by end of 2024?