Loading...
Loading...
Browse all stories on DeepNewz
VisitHow many ARC-AGI tasks will the top 3 AI entries solve by November 10, 2024?
790-793 • 25%
794-796 • 25%
797-799 • 25%
800 • 25%
Official ARC-AGI competition results
Independent NYU Study Finds 98.7% of ARC-AGI Tasks Solvable by Humans Ahead of November 10 Competition Deadline
Sep 4, 2024, 05:53 PM
Researchers at New York University (NYU) conducted an independent study on the ARC-AGI tasks, revealing that 98.7% of the public tasks are solvable by humans. The study found that 790 out of 800 tasks could be completed by at least one Mechanical Turk worker. This finding underscores the gap between human and AI performance on these tasks. The ARC-AGI competition, which challenges participants to develop AI capable of solving these tasks, will end on November 10, 2024. Researchers aim for future iterations to achieve 100% solvability and to establish human baselines on the private test set. Many high-scoring entries in the competition currently rely on basic brute-force program search.
View original story
o3 remains the top performer • 25%
Another model surpasses o3 • 25%
o3 ties with another model • 25%
No new models tested • 25%
Yes • 50%
No • 50%
Outperforms all models • 25%
Outperforms AlphaFold3 but not ESM3 • 25%
Outperforms ESM3 but not AlphaFold3 • 25%
Does not outperform either • 25%
Google DeepMind • 25%
Anthropic • 25%
Meta AI • 25%
Other • 25%
Below 85% • 25%
85% to 90% • 25%
90% to 95% • 25%
Above 95% • 25%
Level 1: Chatbots • 25%
Level 2: Reasoners • 25%
Level 3: Agents • 25%
Level 4 or higher • 25%
0-1 benchmarks • 25%
2-3 benchmarks • 25%
4-5 benchmarks • 25%
More than 5 benchmarks • 25%
Claude 3.5 Haiku • 25%
Claude 3.5 Sonnet • 25%
GPT-4o-mini • 25%
GPT-4o • 25%
Google DeepMind • 25%
OpenAI • 25%
Microsoft AI • 25%
Other • 25%
OpenAI o1-preview • 25%
Anthropic Claude 3.5 Sonnet • 25%
OpenAI o1 mini • 25%
Other • 25%
No • 50%
Yes • 50%
Other • 25%
Hybrid approach • 25%
Brute-force program search • 25%
Machine learning • 25%