DeepNewz Markets

Markets Stories

Search

Loading...

Browse all stories on DeepNewz

AI Safety Institute and Gray Swan AI Release AgentHarm to Measure LLM Agent Harmfulness, Address Jailbreaking

Oct 14, 2024, 12:05 PM

The AI Safety Institute (AISI) and Gray Swan AI have announced the release of AgentHarm, a benchmark designed to measure the harmfulness of large language model (LLM) agents. This dataset aims to evaluate the unique harms posed by AI agents with access to external tools. The collaboration emphasizes the importance of moving beyond simple chatbot evaluations to assess the safety of more complex agent tasks. AgentHarm is described as easy to run, comprehensive, and reliable, and it is partly public, allowing broader accessibility for safety evaluations. The dataset also addresses concerns about jailbreaking and robustness in LLM agents.

View original story

Markets

Loading...

Looking for markets...

Will AgentHarm be adopted by a major AI company for internal evaluations by March 31, 2025?

AI Safety Institute•AISI•Gray Swan AI•AgentHarm

Resolution / Starting Odds

No • 50%

Yes • 50%

Official announcements from major AI companies or press releases

Will AgentHarm be a standard benchmark in AI safety research by end of 2025?

AI Safety Institute•AISI•Gray Swan AI•AgentHarm

Resolution / Starting Odds

No • 50%

Yes • 50%

Inclusion and citation in major AI safety research papers or conferences

Will AgentHarm be updated with new safety metrics by June 30, 2025?

AI Safety Institute•AISI•Gray Swan AI•AgentHarm

Resolution / Starting Odds

Yes • 50%

No • 50%

Announcements from the AI Safety Institute or Gray Swan AI

What will be the primary use case for AgentHarm by October 14, 2025?

AI Safety Institute•AISI•Gray Swan AI•AgentHarm

Resolution / Starting Odds

Chatbot Safety Evaluation • 25%

Other • 25%

Jailbreaking Resistance Testing • 25%

Tool-using Agent Safety • 25%

Surveys or reports from AI companies and industry analysts

Which major AI conference will first feature AgentHarm by end of 2025?

AI Safety Institute•AISI•Gray Swan AI•AgentHarm

Resolution / Starting Odds

ICML 2025 • 25%

Other • 25%

NeurIPS 2024 • 25%

AAAI 2025 • 25%

Conference agendas and presentations

Which region will first mandate AgentHarm for AI safety evaluations by end of 2025?

AI Safety Institute•AISI•Gray Swan AI•AgentHarm

Resolution / Starting Odds

Other • 25%

United States • 25%

European Union • 25%

China • 25%

Government or regulatory announcements