Can AI sandbag safety checks to sabotage users? Yes, but not very well — for now

AI companies claim to have robust safety checks in place that ensure that models don’t say or do weird, illegal, or unsafe stuff. But what if the models were capable of evading those checks and, for some reason, trying to sabotage or mislead users? Turns out they can do this, according to Anthropic researchers. Just […] © 2024 TechCrunch. All rights reserved. For personal use only.

Tomas Kauer - Moderator

Oct 21, 2024 - 06:00

Can AI sandbag safety checks to sabotage users? Yes, but not very well — for now

AI companies claim to have robust safety checks in place that ensure that models don’t say or do weird, illegal, or unsafe stuff. But what if the models were capable of evading those checks and, for some reason, trying to sabotage or mislead users? Turns out they can do this, according to Anthropic researchers. Just […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Tags:

Previous Article

Lyft is working on a ‘service animal opt-in feature’ for passengers

Gusto’s head of technology says hiring an army of specialists is the wrong appro...

What's Your Reaction?

0

Like

0

Dislike

0

Love

0

Funny

0

Angry

0

Sad

0

Wow

Tomas Kauer - Moderator

Related Posts

Crexendo Leverages Oracle Cloud Infrastructure to Fuel Global Growth With New Data Center Partners In Sydney And Melbourne, Australia

Crexendo Leverages Oracle Cloud Infrastructure to Fuel ...

Tomas Kauer - Mode... Sep 10, 2024

Why founders shouldn’t chase massive TAMs

Why founders shouldn’t chase massive TAMs

Tomas Kauer - Mode... Nov 8, 2024

Apple fixes bugs in macOS Sequoia that broke some cybersecurity tools

Apple fixes bugs in macOS Sequoia that broke some cyber...

Tomas Kauer - Mode... Oct 8, 2024