They’ve already identified that the graphs are showing bad training results, so based on this it seems clear that the model training is being confused. This is almost always due to inconsistency in labeling. This is a really good thread showing how the automatic tracking can completely butcher your labeling if you don’t watch it like a hawk.
Yeah, that’s the key here I think. The hardest thing for those who are just starting is taking the time to label their videos correctly - it’s a time-intensive process even when you’re using auto-tracking (no auto-tracking algorithm is perfect for all situations). When I developed the default TensorFlow PowerPlay model with the 3 signal images I initially spent about 1.5 hours on each of 12 videos labeling 30 seconds of video at 15 frames per second (450 frames per video, ~5400 frames total). Then when I ran into issues I had to redo videos, and I decided to completely change how I was taking the videos in order to make labeling them simpler. I made the motion a lot slower and smoother, I took 2 minutes of video per video and changed the framerate afterwards (from 30fps to 5fps, only keeping every 6th frame throwing away most of the frames since they were mostly duplicates anyway), and others. The end result was a video that the default tracker could more easily track with great results and it took me significantly less time to label. If you try to cut corners in labeling, it will significantly bite you later. But I can’t really tell what’s going on in their specific case without seeing their labeled videos. And unfortunately a sample isn’t good enough, I need to see them all.
-Danny