I just commented on another sibling comment (too late to edit the first one), but I forgot to mention my batch size is only 1. I think most people use batch size 4, so basically multiply my time by your batch size for a real comparison.
It was my bad, my script was still running a different fork. Seeing <10 second times with those parameters now. 13.6 seconds for an 3072 × 2048 upscaled image, which I'm particularly happy about.