DeepNewz Markets

Markets Stories

Search

Loading...

Browse all stories on DeepNewz

AI Safety Institute Releases AgentHarm to Measure LLM Agent Harmfulness on October 14, 2024

Oct 15, 2024, 02:22 PM

The AI Safety Institute, in collaboration with GraySwanAI, has announced the release of AgentHarm, a novel dataset designed to measure the harmfulness of large language model (LLM) agents. This benchmark focuses on unique harms from AI agents with access to external tools, addressing a critical gap in current safety evaluations. Announced on October 14, 2024, AgentHarm is comprehensive, reliable, and easy to run, allowing for widespread use. The initiative highlights the need for robust safety mechanisms as LLM agents become more integrated with external systems. Jailbreaking transfers to LLM agents without degrading capabilities, and the dataset is partly public.

View original story

Markets

Loading...

Looking for markets...

Will AgentHarm be adopted as a standard benchmark by three major AI companies by March 31, 2025?

AI Safety Institute•GraySwanAI•AgentHarm

Resolution / Starting Odds

Yes • 50%

No • 50%

Official announcements or press releases from AI companies

Will AgentHarm dataset receive a major update by June 30, 2025?

AI Safety Institute•GraySwanAI•AgentHarm

Resolution / Starting Odds

No • 50%

Yes • 50%

Official announcements from the AI Safety Institute or GraySwanAI

Will a significant vulnerability be discovered in AgentHarm's methodology by December 31, 2024?

AI Safety Institute•GraySwanAI•AgentHarm

Resolution / Starting Odds

No • 50%

Yes • 50%

Public reports or publications identifying vulnerabilities

First sector to report significant impact from AgentHarm by May 31, 2025?

AI Safety Institute•GraySwanAI•AgentHarm

Resolution / Starting Odds

Technology • 25%

Healthcare • 25%

Other • 25%

Finance • 25%

Industry reports or news articles detailing impacts

Primary focus of next AI Safety Institute project by April 30, 2025?

AI Safety Institute•GraySwanAI•AgentHarm

Resolution / Starting Odds

Other • 25%

Transparency and Explainability • 25%

Bias and Fairness • 25%

Robustness and Reliability • 25%

Official announcements from the AI Safety Institute

Which AI company will first integrate AgentHarm into safety evaluations by July 31, 2025?

AI Safety Institute•GraySwanAI•AgentHarm

Resolution / Starting Odds

Other • 25%

OpenAI • 25%

Google DeepMind • 25%

Anthropic • 25%

Official announcements or press releases from AI companies