C is not assembly and hasn't been for a very long time. But I think when people use the people use the phrase "portable assembler" they really mean that in C you both control the memory layout of data types very finely and that code maps very directly to an equivalent assembly construct. True, optimizers frequently change the actual executed code from what what we expect, but C gives a very intuitive feel of what the "upper bound" assembly output is.
For example in C "array[0] = (x + y);" will never be more than a couple assembly instructions long. In many languages, including Haskell (and in the case of operator overloading, C++), the equivalent construct might map to hundreds if not thousands of instructions. Or it might map to the same one or two that C would emit. It's impossible to know and there is no reasonable upper bound on what could happen.
might map to hundreds if not thousands of instructions. Or it might map to the same one or two that C would emit. It's impossible to know and there is no reasonable upper bound on what could happen.
Over every possible piece of code that could be compiled anywhere, this might well be true. But for a properly informed programmer for a given piece of code, not so much.
>But for a properly informed programmer for a given piece of code, not so much.
There are a couple reasons that even for "informed" programmers this is still important
1) For most dynamic languages, even simple operations can take a highly variable amount of time to execute. How many instructions does an array access take in Javascript? The answer depends on everything from the state of the JIT to the types involved, both of which are usually impossible to know before hand. In C we can answer this pretty easily.
2) The modern trend is towards writing more and more generic code. Even for statically compiled languages like C++ and Haskell, the actual underlying operations are purposely* abstracted away from you. Unless you know every possible instance that your code could be used it is impossible to know how long any operation will take.
And all this is assuming that the programmer knows everything about their compiler, assembler, standard library, imported libraries, ect, which isn't true for all but the most expert programmers.
*Admittedly, the actual length of time it takes is dependent on the state of the processor which can be very difficult to predict, but we will have a lot more information than we would have had otherwise.
In general, I agree with you. I just feel the need to expand on a few points.
> in C you both control the memory layout of data types very finely
True to an extent.
> code maps very directly to an equivalent assembly construct
True but less and less relevant.
Here's where C disconnects you from the processor in the ways that matter most:
1. malloc()/free() are too high-level: You can't control where the allocation subsystem gets your next chunk from, you can't see whether your malloc arena is getting full, you can't see whether you're about to double-free something, and you have no way to recover from a failure to allocate (if that's even possible on your OS).
2. C has no concept of cache; admittedly, assembly usually tries to hide it from you to an extent as well, but assembly language at least has hooks into the cache hardware in the form of memory barriers. C doesn't even have that much.
3. C completely hides the processor status word from you. A minor concern, usually, except in precisely the kind of tight loops people most advocate C for.
4. C has no concept of out-of-order execution or opcode pairing or pipelining in general. Just hope your compiler does.
So, added up, that means C is farther and farther from the hardware all the time. It was reasonably close on the PDP-7 where it was born, was fortuitously even closer to the PDP-11 where it was later implemented, and remained fairly good for a while after, but once you get to dual-core superscalar designs with cache hierarchies and SIMD hardware, you have to rely on the compiler to turn your C into good assembly. Which is, really, a lot like what you do when you write Haskell.
For example in C "array[0] = (x + y);" will never be more than a couple assembly instructions long. In many languages, including Haskell (and in the case of operator overloading, C++), the equivalent construct might map to hundreds if not thousands of instructions. Or it might map to the same one or two that C would emit. It's impossible to know and there is no reasonable upper bound on what could happen.