I was training my model wrong for a whole year because of one bad habit

I kept getting okay but not great results on a text classifier I was building. For about a year, I just accepted that the model's accuracy was stuck around 78%. The tip off came last week when I was cleaning up my old code and saw a comment from a friend. He had written 'you're shuffling your validation set, you maniac' in a script I sent him months ago. I had been randomly shuffling my validation data before each evaluation run to 'be safe', thinking it would help. It turns out that was completely breaking my ability to track real progress, because the model was being tested on a different slice of data every single time. I stopped the shuffle, ran it again, and saw the actual accuracy was way more stable, which let me finally see where the real problems were. Has anyone else messed up their evaluation setup in a way that took way too long to spot?

3 comments

3 Comments

kellyj233mo ago

Totally get that. It's like when you keep moving the finish line and wonder why you're not getting closer. I see this with people tracking fitness goals but changing how they measure every week. Makes progress impossible to see. Your validation shuffle was basically the same thing.

robert_bennett293mo ago

Man that's rough. I did something similar by not fixing the random seed for ages, so my results were all over the place. @kellyj23 is right, it's like changing the ruler every time you measure.

the_jenny1mo ago

Shuffling your validation set, you maniac" is still making me laugh but I feel your pain on this. What finally worked for me was writing a simple evaluation script that saved the exact test indices used, so I could verify I was comparing apples to apples every single time. Once I locked that down, my improvements actually started making sense instead of guessing if they were real.