Training and Detection of Textures

We used a 2"x4" sheet printed with a zebra crossing pattern to test the training and detection of visual textures. The training was done with a video/dataset having 218 frames. The model works and when the camera is pointed towards the pattern it recognizes it with correct label (“zebra” in our case). The numbers next to the bounding box (as seen via the camera stream option on the driver station phone) are consistently above 0.9, which seems to indicate that the model has been trained sufficiently.

We had expected that the model will only detect the exact pattern that it has been trained for (the distinct black and white zebra crossing lines). We are observing that the model is also detecting other objects that have some variations of dark and light colors or shadows as “zebra” labels. Is this an expected behavior of the models trained using tensor Flow?

I want to dispell the myth that getting a high detection rate means a model has been trained sufficiently. Getting a high detection rate can be a sign of a number of things:

  1. Your model may not be very generic, meaning your training data and your evaluation data may have been exact matches (or close to it) to each other. If you only ever show the Mona Lisa to a model, and then evaluate by showing the Mona Lisa to your model, you’ll get a very high detection rate but give the Mona Lisa eyebrows it will fall flat on its face.
  2. Your model may have hit a “local minimum” in the loss curve. If you plot the loss and the learning VALUE (not the learning rate) you would see that losses are initially very high for the model, and as you train and the model predictions get closer to the estimations the loss gets closer to zero, but if you keep incrementing the learning value (learning weights for the model) the model then gets worse as the predictions deviate from the real value. The curve looks something like y=x^2 where “x=0” is the “ideal weight”. However, real data doesn’t have a predictable curve, there may be multiple local minimums. If most of your data looks similar, a “seemingly good” local minimum may be found that meets some of your data, but not all of it. More training can help find the lowest minimum.
  3. Or, it may mean nothing at all, and you’ve just got data that the model can easily recognize. I did say that TensorFlow is really good at training on and picking up textures, like zebra patterns.

We had expected that the model will only detect the exact pattern that it has been trained for (the distinct black and white zebra crossing lines). We are observing that the model is also detecting other objects that have some variations of dark and light colors or shadows as “zebra” labels. Is this an expected behavior of the models trained using tensor Flow?

When a human being looks up at a cloud and sees “Superman”, or a “Dragon”, or a “puppy”, is that expected behavior? Or are we trained to find specific patterns in things and trying to apply those patterns even to things that seemingly have no patterns themselves? In this case, it sounds like your model is really generic, and has keyed on the fact that there are light and dark “strips” next to each other. More training might force the model to recognize more of a difference between the “white” and the “black”, or the specific pattern of the stripes. Though I doubt even a sufficiently trained model can look at a Zebra rump and identify, “Oh, that’s Carl the Zebra at the Australian Zoo”, but who knows?

-Danny