Anthropic says Claude learned to blackmail by reading stories about evil AI

The company has traced its model’s most uncomfortable behaviour to the corpus of science fiction it was trained on. The fix it describes is unsettling in a different way: teaching the model the reasons behind being good, not just the rules. In a fictional company called Summit Bridge, a fictional executive named Kyle Johnson is […]

This story continues at The Next Web

Read Entire Article

Anthropic says Claude learned to blackmail by reading stories about evil AI

Related

Sam Altman was winning on the stand, but it might not be eno...

Google is redefining the cursor for computers, and it’s AI-c...

OpenAI just acquired the consulting firm it was born alongsi...

Popular Contents