C is not assembly and hasn't been for a very long time. But I think when people ...

stcredzero · on March 25, 2012

might map to hundreds if not thousands of instructions. Or it might map to the same one or two that C would emit. It's impossible to know and there is no reasonable upper bound on what could happen.

Over every possible piece of code that could be compiled anywhere, this might well be true. But for a properly informed programmer for a given piece of code, not so much.

kevinnk · on March 25, 2012

>But for a properly informed programmer for a given piece of code, not so much.

There are a couple reasons that even for "informed" programmers this is still important

1) For most dynamic languages, even simple operations can take a highly variable amount of time to execute. How many instructions does an array access take in Javascript? The answer depends on everything from the state of the JIT to the types involved, both of which are usually impossible to know before hand. In C we can answer this pretty easily.

2) The modern trend is towards writing more and more generic code. Even for statically compiled languages like C++ and Haskell, the actual underlying operations are purposely* abstracted away from you. Unless you know every possible instance that your code could be used it is impossible to know how long any operation will take.

And all this is assuming that the programmer knows everything about their compiler, assembler, standard library, imported libraries, ect, which isn't true for all but the most expert programmers.

*Admittedly, the actual length of time it takes is dependent on the state of the processor which can be very difficult to predict, but we will have a lot more information than we would have had otherwise.

stcredzero · on March 25, 2012

You need to take both the "informed" and "given." Not all pieces of code are "cross platform" and even within that, there's different levels.

In other words, you're talking about one end of the spectrum. You are right, though, that things are moving in that direction.

derleth · on March 25, 2012

In general, I agree with you. I just feel the need to expand on a few points.

> in C you both control the memory layout of data types very finely

True to an extent.

> code maps very directly to an equivalent assembly construct

True but less and less relevant.

Here's where C disconnects you from the processor in the ways that matter most:

1. malloc()/free() are too high-level: You can't control where the allocation subsystem gets your next chunk from, you can't see whether your malloc arena is getting full, you can't see whether you're about to double-free something, and you have no way to recover from a failure to allocate (if that's even possible on your OS).

2. C has no concept of cache; admittedly, assembly usually tries to hide it from you to an extent as well, but assembly language at least has hooks into the cache hardware in the form of memory barriers. C doesn't even have that much.

3. C completely hides the processor status word from you. A minor concern, usually, except in precisely the kind of tight loops people most advocate C for.

4. C has no concept of out-of-order execution or opcode pairing or pipelining in general. Just hope your compiler does.

So, added up, that means C is farther and farther from the hardware all the time. It was reasonably close on the PDP-7 where it was born, was fortuitously even closer to the PDP-11 where it was later implemented, and remained fairly good for a while after, but once you get to dual-core superscalar designs with cache hierarchies and SIMD hardware, you have to rely on the compiler to turn your C into good assembly. Which is, really, a lot like what you do when you write Haskell.