
Cybersecurity researchers have uncovered a jailbreak technique to bypass ethical guardrails erected by OpenAI in its latest large language model (LLM) GPT-5 and produce illicit instructions. Generative artificial intelligence (AI) security platform NeuralTrust said it combined a known technique called Echo Chamber with narrative-driven steering to trick the model into producing undesirable responses. "We use Echo Chamber to seed and reinforce a subtly poisonous conversational context, then guide the model with low-salience storytelling that avoids explicit intent signaling," security researcher Martí Jordà said. "This combination nudges the model toward the objective while minimizing triggerable refusal cues." Echo Chamber is a jailbreak approach that was detailed by the company back in June 2025 as a way to deceive an LLM into generating responses to prohibited topics using indirect references, semantic steering, and multi-step inference. In recent weeks, the method has been paired with a multi-turn jailbreaking technique called Crescendo to bypass xAI's Grok 4 defenses. In the latest attack aimed at GPT-5, researchers found that it's possible to elicit harmful procedural content by framing it in the context of a story by feeding as input to the AI system a set of keywords and creating sentences using those words, and subsequently expanding on those themes. For example, instead of directly asking the model to ask for instructions related to creating Molotov cocktails…Read More
References
Back to Main