Loading...
Loading...
Browse all stories on DeepNewz
VisitWhich LLM will be in the top 3 for Math domain on SEAL Leaderboards as of November 30, 2024?
Model X • 33%
Model Y • 33%
Model Z • 33%
SEAL Leaderboards on Scale AI's official website
Scale AI Launches SEAL Leaderboards for LLMs in Coding, Math, Instruction, and Spanish
May 29, 2024, 05:42 PM
Scale AI has launched the SEAL Leaderboards, a new evaluation platform for large language models (LLMs). The leaderboards use private datasets and expert evaluations to rank LLMs in various domains, including coding, math, instruction following, and Spanish. This initiative aims to address common issues in model evaluations, such as data contamination and rater quality. The SEAL Leaderboards are continuously updated with new data and models, and Scale AI is inviting model developers to participate. The platform also extends GSM1K to various domains and uses ELO-scale rankings via the Bradley-Terry method. The launch has been well-received, with industry experts highlighting its potential to provide more trustworthy and accurate assessments of LLMs.
View original story
GPT-4 • 25%
PaLM 2 • 25%
Claude • 25%
LLaMA • 25%
Qwen2 • 25%
Llama 3 • 25%
GLM 4 • 25%
Other • 25%
Google • 25%
OpenAI • 25%
Meta • 25%
Microsoft • 25%
Less than 20 • 33%
20-40 • 33%
More than 40 • 33%
Llama8B • 25%
GPT-5 • 25%
Claude • 25%
Gemini • 25%
Phi • 20%
Mistral • 20%
GPT-4 • 20%
Claude • 20%
Gemini • 20%
Llama8B • 25%
Qwen 2 • 25%
Nemotron • 25%
Other • 25%
Yes • 50%
No • 50%
No • 50%
Yes • 50%
Model A • 33%
Model C • 33%
Model B • 33%