>IF you did this SAME experiment with a human and had a human read an ENTIRE doc...

threethirtytwo · 2026-05-09T20:56:54 1778360214

That's the point bro. I am saying this Experiment makes no sense.

Humans don't do that. And Claude doesn't edit documents like that. Because it makes no sense. The point is saying that the Experiment itself is not helpful here.

jrflowers · 2026-05-09T21:06:51 1778360811

It is, in fact, pretty common for people to throw a document at a language model along with a “make it more gooder” prompt.

threethirtytwo · 2026-05-10T00:48:12 1778374092

That was true maybe 7 months ago. This is no longer the case. Harnesses use all kinds of tooling to edit things now.

jrflowers · 2026-05-10T01:44:28 1778377468

People paste entire documents into gemini and chat gpt’s text boxes on the web and assume it will all turn out great

edit: apparently got beaten to this

threethirtytwo · 2026-05-10T08:33:56 1778402036

I don’t understand you. We have an AI model. The AI model is obviously capable.

But you want to use pretend that it’s not useful because non technical people haven’t figured out how to properly use it yet?

Do you think that’s a valid argument? This article is making a claim of 25 percent degredation. Do you think that claim is true because a lot of people don’t use it right?

Humans have 99 percent degredation when editing one punctuation point of an entire book when regurgitating that entire book just to change one punctuation point. Does this statement sound reasonable to you? Because that is the statement you and your genius interloper into this thread are standing behind. Just replace human with LLM and it’s the same kind of genius logic.

rune-dev · 2026-05-10T01:31:25 1778376685

I think you’re living in a bubble if you think the average user of AI even knows what a harness is

The vast majority of people are literally going to chatGPT, pasting in their document and asking for edits.

threethirtytwo · 2026-05-10T03:09:56 1778382596

This will change too man. Maybe I am in a bubble but with how fast things are changing, it won’t be too long before the bubble becomes reality.

Either way we should be doing experiments on the actual capabilities of AI not about the stupidest possible way to use AI because it helps validate your own negative bias against AI.

Additionally as software engineers using agentic AI… which HN basically is… this experiment is not at all relevant in the context of where it is posted. We ALL use agentic ai and we all have the agent use surgical tools for editing. Don’t you find it strange that despite the fact we all do this, HN is full of rabid engineers gobbling this paper up as validation despite complete lack of relevance?

jrflowers · 2026-05-10T05:17:04 1778390224

> This will change too man. Maybe I am in a bubble but with how fast things are changing, it won’t be too long before the bubble becomes reality.

You can’t get mad at an experiment for not happening in the future.

> Either way we should be doing experiments on the actual capabilities of AI

They simulated common end user behavior

>because it helps validate your own negative bias against AI.

We’ve gone from “this study is flawed because language models don’t do that” to “this study is flawed because while language models do do that, I don’t think that they will in the future” to “data that could support a bias other than my own is bad”

threethirtytwo · 2026-05-10T08:29:19 1778401759

> You can’t get mad at an experiment for not happening in the future.

I’m more getting mad at this sentence not making any sense. I’m disappointed at this experiment for not testing the actual capabilities of an LLM. Comprende?

> They simulated common end user behavior

Not the way you use it. And not the way it will be used.

You love it because you want it to stay this way so you can forever believe AI will never be better than you.

Bro the reality is unfolding as you speak. It’s like humanity just discovered guns but hasn’t discovered the bullets and your saying guns are useless because most of humanity hasn’t figured out bullets yet.

> We’ve gone from “this study is flawed because language models don’t do that” to “this study is flawed because while language models do do that, I don’t think that they will in the future” to “data that could support a bias other than my own is bad”

This is a flat out lie. Models DO do that. The only fucking argument you have is that non technical and average laymen people edit documents the wrong way while all people who use agentic AI as adepts use it the correct way. Like are you fucking kidding me?

The only change I acknowledge is your grandma copies and pastes essays into ChatGPT while YOU don’t. You go pretend you live in that reality where the bullets will never appear.

jrflowers · 2026-05-10T10:00:07 1778407207

>You love it because you want it to stay this way so you can forever believe AI will never be better than you.

>Bro the reality is unfolding as you speak

>You go pretend you live in that reality where the bullets will never appear.

It’s too late bro, roko’s basilisk was real and it’s already punishing you

threethirtytwo · 2026-05-10T18:34:34 1778438074

Stick with the argument.

When I said the experiment is inaccurate to the current abilities of AI it’s fucking right. Admit it and stop going off tangents.

There’s no argument against this. You’re dodging and weaving trying to dodge reality. I don’t know who roko is and I don’t give a shit.

jrflowers · 2026-05-10T23:36:22 1778456182

There isn’t an argument. We agree that

> Models DO do that.

and I haven’t challenged that this doesn’t sit comfortably with your opinions about the future. I believe that you feel that way, nobody is arguing that you don’t

rune-dev · 2026-05-10T11:23:19 1778412199

First off, It’s good to study all kinds of things isn’t it? Even if it’s not strictly practical.

Second, and more importantly these AI tools are EVERYWHERE right now. The effects of people using them for work can be seen throughout many industries and workplaces.

So I think studying how these models perform in the vast majority of use cases is not only a good idea, but it’s actually really important.

Even if you’re strictly pro-AI and believe it is the future, a study like this can help you explain to laymen why they need the harnesses you’re so in support of.

threethirtytwo · 2026-05-10T21:40:47 1778449247

> First off, It’s good to study all kinds of things isn’t it? Even if it’s not strictly practical.

Course it is. But the conclusion everyone is coming to is that LLMs are garbage and can’t be used because of 25 percent degradation which is not in line with reality.

> Second, and more importantly these AI tools are EVERYWHERE right now. The effects of people using them for work can be seen throughout many industries and workplaces.

At 25 percent degradation these tools would not be everywhere. They are everywhere because it’s not actually used that way.

> So I think studying how these models perform in the vast majority of use cases is not only a good idea, but it’s actually really important.

I have less of a problem with this study and more about the interpretation of this study.

> Even if you’re strictly pro-AI and believe it is the future, a study like this can help you explain to laymen why they need the harnesses you’re so in support of.

I’m not pro-AI. I’m anti AI. I fucking hate fucking AI.

What I’m angry at is this delusional denial of reality. This experiment is very obviously not accurate yet people are using this study as a headliner to promote an anti AI agenda.

I don’t like AI but that’s different for lying to myself or trying to say AI sucks at something when it is in fact superior then us in this respect.