Hey there! Since you’re asking for help, I took the liberty of looking at your ftc-ml team workspace and at your videos, datasets, and models that you’ve trained so far.
Have you seen the TensorFlow for PowerPlay documentation written for this season? You are currently treating a Red Cone + Sleeve as a single object, instead of treating the objects/images on the cone as individual objects - if that’s the direction you want to take, I certainly won’t try to stop you. I don’t know how well it will work, however.
In your videos you only have four unique poses for the object and LOTS of duplicated frames (shadows are the only variable for many of the images). Unfortunately you only have 75 frames, 60 of which are set for training frames and 15 of which are set for evaluation frames. This is an extremely small dataset, especially for the low number of static poses. Evaluation frames are selected at random from the sample set, and because you have so few images the “random” evaluation set doesn’t even contain all of the poses for your object.
If you look at the model training images, you can see how the model did at recognizing the object after training. One of the evaluation images looks like this:
The image on the right is how you labeled your object, and on the left is how the model is detecting the object. The model doesn’t have enough training and evaluation data - this is often how models look when they’re being trained initially, except your model has been through 100 epochs already. The training data you have is insufficient for the model to figure out how to identify the object and draw the bounding box. Once the model is detecting correctly, the image on the left should look almost exactly like the image on the right.
You definitely need more variability in your poses. Some suggestions:
- If your camera is truly possibly going to see your object at a 90-degree rotation, perhaps showing increments of 10-15 degrees would help make the transition smoother and help the model understand everything in-between. Though, I somehow doubt that’s the case. Even in my models where I only vary the angle about 10 degrees (only due to camera shaking) the final model can handle about 30 degrees of rotation just fine. That’s probably good enough?
- Your image of the cone and sleeve “far away” is likely too far, when scaled down to 300x300 the detail in the objects is lost. Instead of having several frames of that, you should have images where the cone gets incrementally closer to the camera (larger).
- You only show one “rotation” of the cone and images. What happens when the cone is rotated (about the Z axis)? The robot won’t always be looking at the cone perfectly straight…
I really have to quote Kermit the Frog here from The Muppets take Manhattan - “That’s it! That’s what’s been missing from the show! That’s what we need! More frogs and dogs and bears and chickens and… and whatever!” Your issue isn’t that you aren’t focused on the right things, it’s that you need MORE. Consider 100 different images minimum for each label. Vary each image in some way.
I shared one of the videos I took for training the official default model (I used 18 videos in total, 6 for each of the three different images on the cone). You can see how I started off far away, and then zoomed in and panned across - I was focusing on the images themselves, but this trained the model to understand that the images can be of multiple sizes, multiple shapes (since it’s a cone, the images show curvature as I get closer), and multiple rotational offsets. Here’s the video:
If you have more questions, please ask!
Oh, and there’s not much functionality beyond what’s in the sample code. If there’s something specific that you are trying to do but “it doesn’t work”, you should create a new thread and include some code and we’ll help you out. However, we’ve got to get your model detecting first!
-Danny