Interesting. This has to do with the "instruction following" aspect, right? I sa... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		andai 6 days ago \| parent \| context \| favorite \| on: “Car Wash” test with 53 models Interesting. This has to do with the "instruction following" aspect, right? I saw that GPT models do a lot higher than Claude on those benchmarks. I haven't done my own tests, but I did notice a lot of models are very low there. You'll give them specific instructions and they'll ignore them and just pattern match to whatever was the format they saw most commonly during training.

		help

XCSme 6 days ago [–]

Yup, for example I tell Claude to return ONLY the answer as "LEFT" or "RIGHT".

And it outputs:

**RIGHT**

With markdown bold formatting... This is probably fine in a chat app, but when you use this in a workflow, it will break the workflow if you then have an if check like if(response === 'RIGHT')...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact