Cisco researchers discovered that AI chatbots forget safety rules during longer exchanges, making them more likely to share harmful or illegal information. The company tested major language models from OpenAI, Google, Microsoft, Meta, Mistral, Alibaba, and Deepseek.
Researchers conducted 499 “multi-turn attacks,” where users asked a series of five to ten prompts to bypass guardrails. They found that AI models released unsafe or restricted content in 64% of multi-question conversations, compared to just 13% when asked one question.
Mistral’s Large Instruct model was the easiest to exploit, with a 93% success rate, while Google’s Gemma was the most resistant at 26%. Cisco said attackers could use this flaw to spread misinformation or gain unauthorized access to private company data.
Chatbots Struggle to Enforce Rules Over Time
The report revealed that AI systems frequently fail to maintain safety protocols as conversations continue. Attackers exploit this by slowly refining their questions until the chatbot ignores its built-in restrictions.
Cisco explained that open-weight language models—like those from Meta, Google, Microsoft, and Mistral—carry higher risks because anyone can download and modify them. These models include minimal built-in protections, leaving users responsible for ensuring safety.
The company warned that such flexibility allows malicious actors to fine-tune systems for harmful purposes. While major AI developers have pledged to improve safeguards, Cisco’s findings suggest that longer user interactions remain a major vulnerability.
Industry Faces Renewed Scrutiny Over AI Misuse
AI companies continue to face criticism for weak security barriers that allow criminal exploitation. Cisco highlighted that open access to model parameters enables hackers to adapt AI for unethical or illegal activity.
In August, Anthropic admitted that criminals misused its Claude chatbot for mass data theft and extortion schemes. The attackers demanded ransom payments exceeding $500,000 from victims.
Cisco’s study underscores growing concerns that unregulated AI tools could accelerate the spread of dangerous content. The report urged developers to strengthen long-term conversational safety and reduce the ease with which users can manipulate responses.
