I looked at your videos and your models, and your model definitely is not training properly. If you look at your training metrics, the Loss/classification_loss graph tells the whole story here. Here’s what yours looks like:
As you can see for the first 1,000 steps the model seems okay (we’ll get to that in a second) but then as your model trains it just becomes more and more confused about what’s going on - your loss catastrophically increases, showing that the model really goes south. No, that doesn’t mean you should only train for 1,000 steps, that means we should try to correct the problem (more on that later).
As a corollary, here’s what a “good” Loss/classification_loss graph looks like:
Notice the value approaches zero, which is what we want (we want these values as low as we can possibly get them, but they absolutely must be under 1 in order to even remotely be reasonable). In your case, you might have started training well, but when you get a loss of tens of thousands, the model becomes unable to compensate with more training.
The issue here is definitely in how you’re labeling your videos. Let me be clear and say I absolutely LOVE your videos, you’re doing a GREAT job changing the pose (distance, size, angle, orientation) in them. However, what you’re essentially doing is confusing the heck out of the model by not providing it with a consistent object. By labeling a portion of your pattern, you’re making it impossible for the model to really see the entire pattern - this is making it impossible for the model to consistently determine what the full pattern is. TensorFlow expects to see the full “object” that you’re trying to identify (at least few times), plus some “background” material surrounding it. Currently you have this:
But if it’s your intent to actually recognize the patch of blue/polka, your label should REALLY be this:
I also noticed that you’ve got some “negative” images, but negatives for objects and negatives for images are very different from one another. I highly recommend that you read this post, it does a good job of explaining a lot of the nuances for dealing with images versus dealing with objects, and how negatives in this case works. In your case remember you DO still have a background, but the background is the WHITE PART OF THE SLEEVE plus the “scene” in the background. So when creating negatives, you CANNOT simply remove the object. That’s like a painter removing their canvas. This discussion post really explains it:
Also, please be sure to check out the TensorFlow for PowerPlay document on ftc-docs.
If you have additional questions after reading all this, please don’t hesitate to ask!