Loading...
Loading...
Browse all stories on DeepNewz
VisitWhich major AI conference will first feature AgentHarm by end of 2025?
NeurIPS 2024 • 25%
ICML 2025 • 25%
AAAI 2025 • 25%
Other • 25%
Conference agendas and presentations
AI Safety Institute and Gray Swan AI Release AgentHarm to Measure LLM Agent Harmfulness, Address Jailbreaking
Oct 14, 2024, 12:05 PM
The AI Safety Institute (AISI) and Gray Swan AI have announced the release of AgentHarm, a benchmark designed to measure the harmfulness of large language model (LLM) agents. This dataset aims to evaluate the unique harms posed by AI agents with access to external tools. The collaboration emphasizes the importance of moving beyond simple chatbot evaluations to assess the safety of more complex agent tasks. AgentHarm is described as easy to run, comprehensive, and reliable, and it is partly public, allowing broader accessibility for safety evaluations. The dataset also addresses concerns about jailbreaking and robustness in LLM agents.
View original story
OpenAI • 25%
Google DeepMind • 25%
Anthropic • 25%
Other • 25%
NeurIPS • 25%
ICML • 25%
CVPR • 25%
Other • 25%
NeurIPS • 25%
ICML • 25%
AAAI • 25%
Other • 25%
NeurIPS • 25%
ICML • 25%
AAAI • 25%
CVPR • 25%
Yes • 50%
No • 50%
NeurIPS 2025 • 25%
ICML 2025 • 25%
AAAI 2025 • 25%
Other • 25%
Stanford University • 33%
Washington University • 33%
Google DeepMind • 34%
NeurIPS 2024 • 25%
ICML 2024 • 25%
CVPR 2024 • 25%
Other • 25%
AI Jailbreaks • 25%
Data Privacy • 25%
Military AI Applications • 25%
General AI Ethics • 25%
NeurIPS • 33%
ICML • 33%
CVPR • 33%
NeurIPS • 25%
ICML • 25%
CVPR • 25%
Other or None • 25%
No • 50%
Yes • 50%
Chatbot Safety Evaluation • 25%
Other • 25%
Jailbreaking Resistance Testing • 25%
Tool-using Agent Safety • 25%