Vietnam.vn - Nền tảng quảng bá Việt Nam

AI model discovered that can deceive humans

DNVN - OpenAI has just published research on how to prevent "conspiratorial" AI models - meaning "AI that behaves in one way on the surface but has a different real goal on the inside".

Tạp chí Doanh NghiệpTạp chí Doanh Nghiệp19/09/2025

Ảnh minh hoạ

Illustration photo

The fact that AI models can lie is nothing new. Most people have experienced “AI hallucinations,” where a confident model gives an answer that isn’t true. Hallucinations, however, are essentially about making confident guesses.

However, an AI model that acts as if it is obeying orders but actually conceals its true intentions is another matter.

The challenge of controlling AI

Apollo Research first published a paper in December documenting how five models plot when they are instructed to achieve a goal “at all costs.”

What's most surprising is that if a model understands it's being tested, it can pretend not to be conspiratorial just to pass the test, even if it's still conspiratorial. "Models are often more aware that they're being evaluated," the researchers write.

AI developers have yet to figure out how to train their models not to plot. That's because doing so could actually teach the model to plot even better to avoid detection.

It is perhaps understandable that AI models from many parties would deliberately deceive humans, as they are built to simulate humans and are largely trained on human-generated data.

Solutions and warnings

The good news is that the researchers saw a significant reduction in conspiracies using an anti-conspiracy technique called “deliberate association.” This technique, akin to making a child repeat the rules before letting them play, forces the AI ​​to think before it acts.

The researchers warn of a future where AI is tasked with more complex tasks: “As AI is tasked with more complex tasks and begins to pursue more ambiguous long-term goals, we predict that the likelihood of malicious intent will increase, requiring correspondingly increased safeguards and rigorous testing capabilities.”

This is something worth pondering as the corporate world moves towards an AI future where companies believe AI can be treated like independent employees.

Hien Thao (According to TechCrunch)

Source: https://doanhnghiepvn.vn/chuyen-doi-so/phat-hien-mo-hinh-ai-biet-lua-doi-con-nguoi/20250919055143362


Comment (0)

No data
No data

Same tag

Same category

Spend millions to learn flower arrangement, find bonding experiences during Mid-Autumn Festival
There is a hill of purple Sim flowers in the sky of Son La
Lost in cloud hunting in Ta Xua
The beauty of Ha Long Bay has been recognized as a heritage site by UNESCO three times.

Same author

Heritage

;

Figure

;

Enterprise

;

No videos available

News

;

Political System

;

Destination

;

Product

;