Ilya could be wrong. I don’t think the question is decided yet in general. We already know that in lots of fields fake data can be used in ways that are as useful as or even more useful than the real thing[1], but in my understanding that tends to be situations where we have an objective function that is unambiguous and known beforehand. Meta has some very impressive work on synthetic data for training and my (uninformed) read was that is the state of the art in eg voice recognition at the moment.[2]
[1] Eg Sobel sequences in a monte carlo simulation instead of real random numbers. They allow better coverage of the space of a simulation from fewer paths. https://www.sciencedirect.com/science/article/abs/pii/004155...
[2] Seems a good overview is https://arxiv.org/html/2404.07503