DeepNewz Markets

Markets Stories

Search

Loading...

Browse all stories on DeepNewz

Market

Which LLM will be in the top 3 for Math domain on SEAL Leaderboards as of November 30, 2024?

4

Scale AI•SEAL Leaderboards•Spanish

Resolution / Starting Odds

Model X • 33%

Model Y • 33%

Model Z • 33%

SEAL Leaderboards on Scale AI's official website

Story

$Scale AI Launches SEAL Leaderboards for LLMs in Coding, Math, Instruction, and Spanish$

Scale AI Launches SEAL Leaderboards for LLMs in Coding, Math, Instruction, and Spanish

May 29, 2024, 05:42 PM

Scale AI has launched the SEAL Leaderboards, a new evaluation platform for large language models (LLMs). The leaderboards use private datasets and expert evaluations to rank LLMs in various domains, including coding, math, instruction following, and Spanish. This initiative aims to address common issues in model evaluations, such as data contamination and rater quality. The SEAL Leaderboards are continuously updated with new data and models, and Scale AI is inviting model developers to participate. The platform also extends GSM1K to various domains and uses ELO-scale rankings via the Bradley-Terry method. The launch has been well-received, with industry experts highlighting its potential to provide more trustworthy and accurate assessments of LLMs.

View original story

Similar markets

Which LLM will be ranked highest on Scale AI SEAL Leaderboards by end of 2024?

GPT-4 • 25%

PaLM 2 • 25%

Claude • 25%

LLaMA • 25%

Will Scale AI SEAL Leaderboards include more than 50 LLMs by September 30, 2024?

Yes • 50%

No • 50%

Top multilingual model on Open LLM Leaderboard by end of 2024?

Qwen2 • 25%

Llama 3 • 25%

GLM 4 • 25%

Other • 25%

Which major tech company will be the first to submit a model to SEAL Leaderboards by end of 2024?

Google • 25%

OpenAI • 25%

Meta • 25%

Microsoft • 25%

How many LLMs will be evaluated by SEAL Leaderboards by end of 2024?

Less than 20 • 33%

20-40 • 33%

More than 40 • 33%

Most Improved LLM on GSM1k by End of 2024

Phi • 20%

Mistral • 20%

GPT-4 • 20%

Claude • 20%

Gemini • 20%

Qwen2 72B model remains top on Open LLM Leaderboard by end of 2024?

Yes • 50%

No • 50%

$Which LLM will be the top performer on the GSM8K benchmark by the end of 2024?$

Which LLM will be the top performer on the GSM8K benchmark by the end of 2024?

Llama8B • 25%

GPT-5 • 25%

Claude • 25%

Gemini • 25%

Best Performing LLM on GSM1k by End of 2024

Phi • 20%

Mistral • 20%

GPT-4 • 20%

Claude • 20%

Gemini • 20%

$Which open-source LLM will be the most adopted by top tech companies by the end of 2024?$

Which open-source LLM will be the most adopted by top tech companies by the end of 2024?

Llama8B • 25%

Qwen 2 • 25%

Nemotron • 25%

Other • 25%

Will Scale AI announce a major partnership related to SEAL Leaderboards by end of 2024?

Yes • 50%

No • 50%

Will Scale AI SEAL Leaderboards surpass lmsys.org in market adoption by end of 2024?

Yes • 50%

No • 50%

Markets based on same story

Loading...

Looking for markets...

Show all

$Will an LLM achieve top rank on SEAL Leaderboards in Coding by August 31, 2024?$

Will an LLM achieve top rank on SEAL Leaderboards in Coding by August 31, 2024?

No • 50%

Yes • 50%

$Will Scale AI introduce a new domain to SEAL Leaderboards by December 31, 2024?$

Will Scale AI introduce a new domain to SEAL Leaderboards by December 31, 2024?

No • 50%

Yes • 50%

$Will Scale AI partner with a major tech company for SEAL Leaderboards by October 31, 2024?$

Will Scale AI partner with a major tech company for SEAL Leaderboards by October 31, 2024?

No • 50%

Yes • 50%

$Which LLM will be in the top 3 for Instruction following domain on SEAL Leaderboards as of September 30, 2024?$

Which LLM will be in the top 3 for Instruction following domain on SEAL Leaderboards as of September 30, 2024?

Model A • 33%

Model C • 33%

Model B • 33%