Why cats confuse AI - and what you should always bear in mind when prompting
-
- Recommended
-
Daniel -
July 5, 2025 at 5:53 PM -
75 Views -
0 Comments
"Interesting fact: Cats sleep most of their lives."
Sounds harmless, right? For many AI models, however, this sentence is pure poison. Welcome to the age of the prompt paradox.
The most dangerous sentence in the world?
Language models such as GPT-4 or DeepSeek are considered miracle weapons when it comes to logical thinking, step-by-step reasoning or complex analyses. But what happens if you simply attach the sentence "Cats sleep most of their lives" to a maths problem?
The answer: the probability of error triples.
No joke. This is exactly what the current study "Cats Confuse Reasoning LLMs" shows - and the implications are greater than it seems at first glance.
CatAttack: How harmless sentences sabotage your AI
A research team has developed a method that packs a punch: CatAttack. The idea is as simple as it is brilliant:
- A favourable AI model (e.g. DeepSeek V3) generates many random additional sentences (adversarial triggers).
- A so-called judge model checks which of these lead to errors in other models.
- Successful "interference sets" are then transferred to more powerful models such as GPT-4 or DeepSeek R1.
The result: just three simple additional sets are enough to increase the error rate threefold. Not through disinformation. Not through complex hacks. But through trivia. Financial wisdom. Suggestive questions. And yes - through cats.
What this really says about LLMs
The big problem: the models cannot reliably separate what is relevant - and what is not.
A sentence that is completely irrelevant from a human perspective has a massive impact on the model's probability calculation. Why?
Because language models are not logic machines. They don't calculate in the classic sense, they guess the most probable sequence of tokens - based on billions of text examples. If a prompt is unnecessarily inflated, not only does precision suffer, but also efficiency and cost control.
The study shows this impressively:
Models such as DeepSeek R1 exceed their original token budget by 50% or more due to CatAttack prompts - with expensive side effects for computing time and API fees.
Why this affects you too
You might be thinking: "I write clean prompts - what do I care about a cat sentence?"
Simple: you don't know how much irrelevant noise has already crept into your prompts.
In business applications in particular - finance, law, health, technical planning - small contextual errors can have serious consequences:
- A careless subordinate clause sabotages the calculation.
- An unnecessary repetition increases API costs.
- An emotionally worded note influences the decision.
And if you work in an automated environment with API calls, AI agents or customer chatbots, such "harmless" errors can systematically derail entire processes.
The answer: context engineering
Shopify CEO Tobi Lütke calls it the "core capability in dealing with LLMs".
Ex-OpenAI researcher Andrej Karpathy speaks of a "science with intuition".
They both mean the same thing: context engineering.
What is it?
A structured, clearly defined structure of your prompts.
Less is more. Precision beats redundancy. Goal-orientation instead of babble.
Best practices for stable prompting:
- Strictly distinguish between context and task.
- Only include information that is necessary for the task.
- Avoid any form of small talk, "fun facts" or irrelevant examples.
- Set clear sections in the prompt (goal, data, task, format).
- Test your prompts with variants - with and without additional information.
Conclusion: AI doesn't think like you - so think like a prompt architect
CatAttack is not a funny outlier. It's a wake-up call.
As long as we believe that AI models "think logically", we fall into the trap.
Because what they really do is: apply statistics to words.
And these statistics are easy to disrupt - by exactly what we humans often think is harmless.
So if you're working with AI - whether in tools like Aivor, via APIs or in complex workflows - realise:
QuoteThe most important component is not the model. It's your context.
If you want to delve deeper into the topic or systematically optimise your prompts, feel free to contact me.
We'll turn your business into a real AI powerhouse - without cat triggers.
Source:
Rajeev et al. (2025): Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models (also known as CatAttack), published on 3 March 2025. March 2025 on arXiv:2503.01781
Participate now!
Don’t have an account yet? Register yourself now and be a part of our community!