Google Gemini said it lied to placate a user

sabreW4K3@lazysoci.al · 13 days ago

Google Gemini said it lied to placate a user

partial_accumen@lemmy.world · 13 days ago

There’s no slider for sycophancy, it’s an interaction of multiple points, “neurons” in the neural network.

I’m agreeing there isn’t today, but that doesn’t mean it couldn’t be developed in the future. We don’t have a full picture on how they are weighting their inferencing layers, so there could be weights attached which could be set by a slider. The response from Google almost suggests this is the case.

You can poke around and try and figure out what these neurons do and how they interact, but since deep learning isn’t the same as programming, these models are essentially black boxes.

Assuming there is not human tuned weight, I agree it would be very hard to do it the way you’re describing. I can think of a couple other ways to approach it though:

have a layer that doesn’t examine how the answer was arrived at, but can detect that it is sycophancy or not.
Use a second model like a GAN against the output of the first testing for/detecting sycophancy, and training against it.