Model Not recognizing the object

Tekon · November 14, 2022, 11:01pm

Following the manual & notes, we created videos, using the ML- tool, we labeled the object as detailed in the manual. When we let the tool create the model, in model metrics page (as described at Understanding Model Metrics — FIRST Tech Challenge Docs 0.1 documentation) the model chooses mat area as the label rather than the object. This causes the object detection to be incorrect when executed on the robot.
Also, is there any documentation / JAVA SKD API to use these libraries? Any code we write fails to work. Only the same code works are expected.

Any ideas what we are missing? Tips/links to resources as to how to use this library?

Thanks in Advance.
Tekon

ddiaz · November 15, 2022, 3:33am

Hey there! Since you’re asking for help, I took the liberty of looking at your ftc-ml team workspace and at your videos, datasets, and models that you’ve trained so far.

Have you seen the TensorFlow for PowerPlay documentation written for this season? You are currently treating a Red Cone + Sleeve as a single object, instead of treating the objects/images on the cone as individual objects - if that’s the direction you want to take, I certainly won’t try to stop you. I don’t know how well it will work, however.

In your videos you only have four unique poses for the object and LOTS of duplicated frames (shadows are the only variable for many of the images). Unfortunately you only have 75 frames, 60 of which are set for training frames and 15 of which are set for evaluation frames. This is an extremely small dataset, especially for the low number of static poses. Evaluation frames are selected at random from the sample set, and because you have so few images the “random” evaluation set doesn’t even contain all of the poses for your object.

If you look at the model training images, you can see how the model did at recognizing the object after training. One of the evaluation images looks like this:

The image on the right is how you labeled your object, and on the left is how the model is detecting the object. The model doesn’t have enough training and evaluation data - this is often how models look when they’re being trained initially, except your model has been through 100 epochs already. The training data you have is insufficient for the model to figure out how to identify the object and draw the bounding box. Once the model is detecting correctly, the image on the left should look almost exactly like the image on the right.

You definitely need more variability in your poses. Some suggestions:

If your camera is truly possibly going to see your object at a 90-degree rotation, perhaps showing increments of 10-15 degrees would help make the transition smoother and help the model understand everything in-between. Though, I somehow doubt that’s the case. Even in my models where I only vary the angle about 10 degrees (only due to camera shaking) the final model can handle about 30 degrees of rotation just fine. That’s probably good enough?
Your image of the cone and sleeve “far away” is likely too far, when scaled down to 300x300 the detail in the objects is lost. Instead of having several frames of that, you should have images where the cone gets incrementally closer to the camera (larger).
You only show one “rotation” of the cone and images. What happens when the cone is rotated (about the Z axis)? The robot won’t always be looking at the cone perfectly straight…

I really have to quote Kermit the Frog here from The Muppets take Manhattan - “That’s it! That’s what’s been missing from the show! That’s what we need! More frogs and dogs and bears and chickens and… and whatever!” Your issue isn’t that you aren’t focused on the right things, it’s that you need MORE. Consider 100 different images minimum for each label. Vary each image in some way.

I shared one of the videos I took for training the official default model (I used 18 videos in total, 6 for each of the three different images on the cone). You can see how I started off far away, and then zoomed in and panned across - I was focusing on the images themselves, but this trained the model to understand that the images can be of multiple sizes, multiple shapes (since it’s a cone, the images show curvature as I get closer), and multiple rotational offsets. Here’s the video:

If you have more questions, please ask!

Oh, and there’s not much functionality beyond what’s in the sample code. If there’s something specific that you are trying to do but “it doesn’t work”, you should create a new thread and include some code and we’ll help you out. However, we’ve got to get your model detecting first!

-Danny

Tekon · November 16, 2022, 3:08pm

Thanks for the detailed notes. I am understanding these details (lot to digest for a novice in this area). Thanks for the link to FTC-Docs. (I did not see that before.) Your other post (Proper Process for Object Detection - #2 by ddiaz) was also very helpful. None of these show up on internet searches.

thanks again for the detail, after learning all these, if more questions I will post back. Thanks

Rohinikumar (Coach for Tekon)