That stat about AI models needing 10x more data for every 2x improvement blew my mind

I read in a paper from Stanford that scaling up model size gives diminishing returns way faster than I thought, like a 10x data jump for only a 2x gain in accuracy. Is this really the best path forward or are we just wasting compute? Curious what other people think.

2 comments

2 Comments

the_richard10d ago

I used to think just throwing more data at these models was the obvious answer too, but that Stanford paper really flipped my switch. Like, I remember reading about GPT-3 and thinking "wow, just add more parameters and data and it keeps getting smarter." But now it feels like we're just pushing against a wall where each tiny gain costs a mountain of compute and energy. That 10x data for 2x accuracy ratio is brutal, especially when you think about all the carbon and money burned for that tiny bump. It honestly makes me wonder if we should be putting more effort into smarter architectures or training methods instead of just brute forcing with bigger datasets.

matthew_morgan10d ago

Isn't that Stanford ratio assuming we're measuring accuracy the right way though?