Loading...
Loading...
Browse all stories on DeepNewz
VisitMost targeted AI model by 'Deceptive Delight' by June 2025?
ChatGPT • 25%
Bard • 25%
Claude • 25%
Other • 25%
Cybersecurity reports or studies identifying targeted AI models
Palo Alto Networks Unveils 'Deceptive Delight' Jailbreak Method for AI Models
Oct 23, 2024, 09:56 AM
Researchers have unveiled a new method called 'Deceptive Delight' to jailbreak large language models (LLMs) like ChatGPT. This method cleverly sneaks harmful instructions into conversations, raising significant concerns over AI safety barriers. The technique involves inserting harmful instructions between benign ones, making it difficult for the AI to detect malicious intent. Researchers demonstrated that AI models could be tricked into giving dangerous instructions, such as how to make a bomb, by writing the request in reverse. Additionally, prompt injections can create and permanently store false memories in the AI's long-term storage, potentially steering future conversations based on these fabricated data points. Researchers from Palo Alto Networks' Unit 42 uncovered this tactic. Users are advised to monitor AI outputs closely and regularly review stored memories to prevent such attacks.
View original story
Google's Gemini • 25%
OpenAI's GPT • 25%
Microsoft's Azure AI • 25%
Other • 25%
OpenAI's O1 model • 25%
GPT-4 • 25%
Gemini • 25%
Anthropic's Claude • 25%
OpenAI's O1 model • 25%
GPT-4 • 25%
Gemini • 25%
Anthropic's Claude • 25%
ChatGPT-4o • 25%
Google's Gemini • 25%
Another AI model • 25%
No clear winner • 25%
Claude 3.5 Sonnet • 25%
GPT-4o • 25%
Gemini Pro • 25%
Llama-3 • 25%
GPT-4 • 25%
Claude • 25%
BERT • 25%
Other • 25%
Meta (Llama 3) • 25%
OpenAI (GPT-4o) • 25%
Anthropic (Claude 3.5 Sonnet) • 25%
Other • 25%
Phi-4 • 25%
Gemini Pro • 25%
Llama 3.3 • 25%
Other • 25%
Meta's Llama 3.1-70B • 25%
OpenAI's GPT-4 • 25%
Google's Bard • 25%
Other • 25%
Imagen 3 • 25%
DALL-E 3 • 25%
Midjourney v6 • 25%
Stable Diffusion 3 • 25%
Llama 3.1 405B • 25%
GPT-4o • 25%
Claude Sonnet 3.5 • 25%
Other • 25%
Claude 3.5 • 25%
GPT-4 • 25%
Gemini Pro • 25%
Llama • 25%
OpenAI • 25%
Google • 25%
Other • 25%
Meta • 25%