This is an age-old idea, RISC compilers were supposed to do this too, the mythic...

MindSpunk · 2025-05-07T00:37:12 1746578232

It's not so much about having a "sufficiently smart compiler" in the case of GPUs doing compiler assisted scheduling. It's about not having to implement that logic in hardware at all. The more smarts they push into the core hardware, the more silicon each core needs, the less cores you can fit, and more power you spend on figuring out what to run rather than crunching numbers.

Doing the work in the compiler may produce less optimal scheduling than what is theoretically possible, but with the number of "cores" in a GPU you would spend a lot of power doing it in hardware for each one.

nabla9 · 2025-05-07T14:12:32 1746627152

RISC woks well with compilers (ARM, RISC-V), they don't require mythical compilers, just standard good ones.

You are probably thinking VLIW like Intels Itanium, and Transmeta. Those architectures required really smart compiler for scheduling and it was a bust.

Nvidia GPU's need smart compiler and it works because the task is limited to optimizing numerical pipelines that are 99% matrix multiplication, dot products. The data movement is more predicable. Compilers know how the data will be used and know how to schedule.

peterfirefly · 2025-05-09T22:45:45 1746830745

MIPS used to not have interlocked pipeline stages. It was the compiler's job to work around that. MIPS -- and many other RISCs -- had a branch delay slot. It was the compiler's job to try to do something useful with that. RISCs stayed in-order for a long time -- it was the compiler's job to try do schedule the instructions in a way that compensated as well as possible for that.

GPUs rely on fairly smart compilers -- but they also hide latency (memory access) by switching hw threads (a bit like barrel processors of yore).