Aren't most XML parsers SAX or STaX based? Only time I ran into a library that only offered a full DOM without the underlying event based parser was whatever browsers consider the JavaScript standard library.
You're totally right! Many good stock parsers already stream things (more or less).
Still, I'm just making a comment about the overhead... I would hedge a guess that you're going to have a hard time beating a regex with an HTML parser for speed, assuming what you want can be done with both.
This is all irrelevant, because as the OP mentions, the SO question at hand cannot be solved with standards compliant parsers because self-closing tags will not be distinguishable.