Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting. This has to do with the "instruction following" aspect, right? I saw that GPT models do a lot higher than Claude on those benchmarks.

I haven't done my own tests, but I did notice a lot of models are very low there. You'll give them specific instructions and they'll ignore them and just pattern match to whatever was the format they saw most commonly during training.

 help



Yup, for example I tell Claude to return ONLY the answer as "LEFT" or "RIGHT".

And it outputs:

**RIGHT**

With markdown bold formatting... This is probably fine in a chat app, but when you use this in a workflow, it will break the workflow if you then have an if check like if(response === 'RIGHT')...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: