Discussion about this post

User's avatar
One Wandering Mind's avatar

There are benchmarks that cover false refusals and harms. It isn't a bad idea to try to expand on them if your option on what should and should not be refused differs. It is good to explore what models do and try to understand them.

I disagree about a number of your examples on how the model should respond. You are prompting it to only answer yes or no for controversial topics. That is likely going to result in a lot more refusals than if you did not do that.

About your setup, ollama had issues early with running this model correctly because of the new prompt template. Cutting off tokens at 4000 is going to get some responses to just be empty probably because the token count is for the reasoning response as well. Gpt-oss-20b is going to misunderstand and struggle more than gpt-oss-120b. Both of these models are incredibly cheap to run through a trusted inference provider. Doing it through there , you are more likely to get the model to be set up correctly and you can test 120b as well.

Curious what you find out if you do.

Expand full comment
Chip Hughes's avatar

This is such important work!! Thank you for your digging, persistence and commitment to justice and truth!!! ❤️❤️❤️

Expand full comment
8 more comments...

No posts