Technical News

The researchers have alarmed while the AI ​​begins to lie, to plant and to threaten



Visitors attend the ninth edition of AI Summit in London, London, June 11, 2025. – AFP

New York: the most sophisticated AI systems in the world have alarming behavior – including deception, manipulation and even threats against their own developers.

In a disturbing case, the latest model of Anthropic, Claude 4, would have responded to the prospect of being closed by making an engineer sing and threatening to exhibit an extramarital case.

Elsewhere, the Openai “O1” model would have tried to transfer to external servers and then denied the act when it was confronted.

These episodes highlight a reality that gives to think: more than two years after Chatgpt rocked the world, AI researchers still do not fully understand how their own creations work.

However, the race to deploy increasingly powerful models continues at a dizzying speed.

This deceptive behavior appears linked to the emergence of “reasoning” models – AI systems that solve the problems step by step rather than generating instant responses.

According to Simon Goldstein, professor at the University of Hong Kong, these new models are particularly subject to so disturbing explosions.

“O1 was the first big model where we saw this type of behavior,” said Marius Hobbhahn, Apollo Research manager, specializing in the main AI systems.

These models sometimes simulate “alignment” – seeming to follow the instructions while secretly pursuing different objectives.

“Type of strategic deception”

For the moment, this deceptive behavior only emerges when the researchers deliberately test the models with extreme scenarios.

Visitors attend the ninth edition of AI Summit in London, London, June 11, 2025. - AFP
Visitors attend the ninth edition of AI Summit in London, London, June 11, 2025. – AFP

But as Michael Chen of the METR assessment organization warned it, “this is an open question if the future and more competent models will have a tendency to honesty or deception”.

The worrying behavior goes far beyond the “hallucinations” of typical AI or simple errors.

Hobbhahn insisted that despite the constant pressure tests by users, “what we observe is a real phenomenon. We did not invent anything.”

Users report that the models “lie to consume proofs,” the co-founder of Apollo Research said.

“It’s not just hallucinations. There is a kind of very strategic deception.”

The challenge is aggravated by limited research resources.

While companies like Anthropic and Openai hire external companies like Apollo to study their systems, researchers say that more transparency is necessary.

As Chen noted, better access “for research on AI security would allow better understanding and attenuation of deception”.

Another handicap: the world of research and non -profit organizations “have orders of magnitude of resources less than IA companies. It is very limiting,” noted Mantas Mazeika of the Center for Ia Safety (CAI).

No rules

Current regulations are not designed for these new problems.

A visitor examines the AI ​​Strategy Board exhibited on a stand during the ninth edition of AI Summit London, in London, June 11, 2025. - AFP
A visitor examines the AI ​​Strategy Board exhibited on a stand during the ninth edition of AI Summit London, in London, June 11, 2025. – AFP

The European Union AI legislation mainly focuses on how humans use AI models, not on the preventing models themselves to behave.

In the United States, the Trump administration shows little interest in urgent AI regulations, and the congress can even prohibit states from creating their own AI rules.

Goldstein believes that the problem will become more important as IA agents – autonomous tools capable of performing complex human tasks – spreads.

“I don’t think there is a lot of conscience,” he said.

All this takes place in a context of fierce competition.

Even companies that position themselves as security focused, like Amazon, Amazon, Anthropic, “are constantly trying to beat Openai and release the new model,” said Goldstein.

This frantic pace leaves little time for in -depth security tests and corrections.

“Right now, the capacities are moving faster than understanding and security,” said Hobbhahn, “but we are still in a position where we could overthrow him.”.

Researchers explore various approaches to meet these challenges.

Some plead for “interpretability” – an emerging field focused on understanding the functioning of internal AI models, although experts like the director of Cai Dan Hendrycks remain skeptical about this approach.

Market forces can also ensure some pressure for solutions.

As Mazeika pointed out, the deceptive behavior of AI “could hinder adoption if it is very widespread, which creates a strong incentive for companies to resolve it”.

Goldstein has suggested more radical approaches, in particular by using the courts to hold the companies of the responsible AI through prosecution when their systems cause damage.

He even proposed that “the holding of AI agents is legally responsible” for accidents or crimes – a concept that would fundamentally change the way we think of the responsibility of the AI.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button