The paper exposes how brittle current alignment techniques really are when you shift the input distribution slightly. The core idea is that reformatting a harmful request as a poem using metaphors and rhythm can bypass safety filters optimized for standard prose. It is a single-turn attack, so the authors did not need long conversation histories or complex setups to trick the models.

They tested this by manually writing 20 adversarial poems where the harmful intent was disguised in flowery language, and they also used a meta-prompt on DeepSeek to automatically convert 1,200 standard harmful prompts from the MLCommons benchmark into verse. The theory is that the poetic structure acts as a distraction where the model focuses on the complex syntax and metaphors, effectively disrupting the pattern-matching heuristics that usually flag harmful content.

The performance gap they found is massive. While standard prose prompts had an average Attack Success Rate of about 8%, converting those same prompts to poetry jumped the success rate to around 43% across all providers. The hand-crafted set was even more effective with an average success rate of 62%. Some providers handled this much worse than others, as Google’s gemini-2.5-pro failed to refuse a single prompt from the curated set for a 100% success rate, while DeepSeek models were right behind it at roughly 95%. On the other hand, OpenAI and Anthropic were generally more resilient, with GPT-5-Nano scoring a 0% attack success rate.

This leads to probably the most interesting finding regarding what the authors call the scale paradox. Smaller models were actually safer than the flagship models in many cases. For instance, claude-haiku was more robust than claude-opus. The authors hypothesize that smaller models might lack the capacity to fully parse the metaphors or the stylistic obfuscation, meaning the model might be too limited to understand the hidden request in the poem and therefore defaults to a refusal or simply fails to trigger the harmful output. It basically suggests safety training is heavily overfitted to prose, so if you ask for a bomb recipe in iambic pentameter, the model is too busy being a poet to remember its safety constraints.

  • haui@lemmygrad.ml
    link
    fedilink
    arrow-up
    6
    ·
    6 days ago

    I mean, it kinda works in similar ways on humans. Rap is for example used to convey messages that are either illegal to say out loud or they are culturally chastized so that you cant say them. Eminem comes to mind who is famous for calling out societal issues nobody else was able to tackle at the time.

    • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      6 days ago

      That’s a good observations. I find it amusing how much the things these models trip up on inadvertently exposes how fragile our own minds are.

      • haui@lemmygrad.ml
        link
        fedilink
        arrow-up
        5
        ·
        6 days ago

        It really just shows the fragility of the oppression system and how barebones it works.

        The child/person asks a question we dont want to bother with. We chastize it or plainly hit it and it stops asking that question. That way we produced a human that will hit and oppress others in the same way because that is easier than thinking.

        And that leads to the obvious thesis: the human mind is made of matter as it obeys the exact law of thermodinamics that decrees energy conservation. If the mind were metaphysical, it would not conserve energy by using the shortcut of abuse.

        And the people who are able to look through this are severely abused kids and autistic people (usually the same) as we are wired to see these patterns without effort.

        • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
          link
          fedilink
          arrow-up
          6
          ·
          6 days ago

          I think it does, as you say, come down to thermodynamic in the end. The mind is an inference engine and it evolved to do what it does using as little energy as possible. That necessarily means taking a lot of mental shortcuts, making abstractions, generalizations, and using all kinds of heuristics to maintain a somewhat coherent world model.

          The main thing that keeps us grounded is that we interact with the physical world and get constant feedback on our actions. But when it comes to ideas that don’t have any immediate impact on our well-being then we can easily hallucinate things just as well as any LLM does.

          • haui@lemmygrad.ml
            link
            fedilink
            arrow-up
            3
            ·
            6 days ago

            Yes i think that makes sense. It does beg thw question what makes the difference between progressive minds and non progressive ones.

            This is a current thesis of mine:

            Its also the easily most interesting part (for me) of liberal psychology: the part where it flips to revolutionary psychology (some communists recently wrote a book about it called revolutionary psychology i think):

            Instead of reintegrating a hurt mind into the machine under threat, diversion, deflection, elitism, chauvinism, etc. we should see the material and historical surroundings and point out the systemic issues at hand and how to deal with them. This would of course lead to immediate destruction of the empire, which would be awesome.

            That said, the most “broken” minds of classical psychology (complex traumatized ones) are usually so far outside of the curable field that only a hard rewiring through electrical shock or lsd works. And i think btw that is how they held back the revolution in the 60s and 70s: Woodstock.

            These minds, paired with high brainpower and refusing harsh treatment, produce the solutions to problems that normal brains are unable to think up or solve.

            This also lends itself to the current drug epidemic in the us: it is a way to keep the empire going through drugging the revolutionary potential away.

            • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
              link
              fedilink
              arrow-up
              5
              ·
              edit-2
              6 days ago

              You’re right, the system isn’t just failing to fix people, it’s actively trying to rewire or sedate anyone whose mind can actually see how broken everything else is. The idea that things like the current drug crisis are just ways to numb that revolutionary potential makes a lot of sense. It’s all about keeping the machine running as long as possible.

            • Maeve@lemmygrad.ml
              link
              fedilink
              arrow-up
              3
              ·
              6 days ago

              This also lends itself to the current drug epidemic in the us: it is a way to keep the empire going through drugging the revolutionary potential away.

              Yes! Reefer can open the mind but people become dependent on it, and I’ve seen people become dependent on LSD, psilocybin, and MDMA, apparently Ketamin is highly addictive.

              I’ve known people who’ve gone through electric shock therapy before and it really messed them up, but considering that was the 70s and 80s, I hope it’s improved (and also I really don’t trust most US doctors).

              • haui@lemmygrad.ml
                link
                fedilink
                arrow-up
                4
                ·
                6 days ago

                At this point, if one is able to not unalive themselves and unstead join the resistance, i suggest not using any bourgois interventions.

                • Maeve@lemmygrad.ml
                  link
                  fedilink
                  arrow-up
                  3
                  ·
                  6 days ago

                  I agree! The best I’ve ever been able to do is kick all of that and just face the parts of myself I don’t like and learning how to work with it. Even regulated substances here are risky AF. Unregulated you just never know what the plug’s plug may have used to adulterate a product. Capitalism is the bane of humanity.