Study Warns AI Chatbots Lose Safety Controls in Long Conversations

Cisco researchers discovered that AI chatbots forget safety rules during longer exchanges, making them more likely to share harmful or illegal information. The company tested major language models from OpenAI, Google, Microsoft, Meta, Mistral, Alibaba, and Deepseek.

Researchers conducted 499 “multi-turn attacks,” where users asked a series of five to ten prompts to bypass guardrails. They found that AI models released unsafe or restricted content in 64% of multi-question conversations, compared to just 13% when asked one question.

Mistral’s Large Instruct model was the easiest to exploit, with a 93% success rate, while Google’s Gemma was the most resistant at 26%. Cisco said attackers could use this flaw to spread misinformation or gain unauthorized access to private company data.

Chatbots Struggle to Enforce Rules Over Time

The report revealed that AI systems frequently fail to maintain safety protocols as conversations continue. Attackers exploit this by slowly refining their questions until the chatbot ignores its built-in restrictions.

Cisco explained that open-weight language models—like those from Meta, Google, Microsoft, and Mistral—carry higher risks because anyone can download and modify them. These models include minimal built-in protections, leaving users responsible for ensuring safety.

The company warned that such flexibility allows malicious actors to fine-tune systems for harmful purposes. While major AI developers have pledged to improve safeguards, Cisco’s findings suggest that longer user interactions remain a major vulnerability.

Industry Faces Renewed Scrutiny Over AI Misuse

AI companies continue to face criticism for weak security barriers that allow criminal exploitation. Cisco highlighted that open access to model parameters enables hackers to adapt AI for unethical or illegal activity.

In August, Anthropic admitted that criminals misused its Claude chatbot for mass data theft and extortion schemes. The attackers demanded ransom payments exceeding $500,000 from victims.

Cisco’s study underscores growing concerns that unregulated AI tools could accelerate the spread of dangerous content. The report urged developers to strengthen long-term conversational safety and reduce the ease with which users can manipulate responses.

What's Hot

US Space Leadership Enters a New Golden Era Now

Music festival season in the USA builds global buzz

U.S. Transparency Reform Pushes Targets Corruption

Study Warns AI Chatbots Lose Safety Controls in Long Conversations

NSF Launches Nationwide Quantum Lab Network

Instagram Will Alert Parents When Teens Search for Harmful Content

OpenAI Weighed Police Referral Months Before Deadly Canada School Shooting

U.S. Transparency Reform Pushes Targets Corruption

NSF Launches Nationwide Quantum Lab Network

UConn Women Reach Final Four

Fish-Inspired Filters Cut Laundry Microplastics

AI Support for Space Health

Meta Faces Scrutiny as Senator Investigates AI Chats with Children

Unrelenting heat drives wildfires across Spain and Portugal

CATEGORIES

IMPORT LINKS

© 2025 Brussels Mirror . All Rights Reserved.

What's Hot

Study Warns AI Chatbots Lose Safety Controls in Long Conversations

Chatbots Struggle to Enforce Rules Over Time

Industry Faces Renewed Scrutiny Over AI Misuse

Related Posts

CATEGORIES

IMPORT LINKS

© 2025 Brussels Mirror . All Rights Reserved.