Machine Learning Ultra-Brittleness and Object Orientation Poses: The Case of AI Self-Driving Cars

2939
The Google Inception-v3 classifier [40] correctly labels the canonical poses of objects (a), but fails to recognize out-of-distribution images of objects in unusual poses (b–d), including real photographs retrieved from the Internet (d). The left 3 × 3 images (a–c) are found by our framework and rendered via a 3D renderer. Below each image are its top-1 predicted label and confidence score.

By Lance Eliot, the AI Trends Insider

Take an object nearby you and turn it upside down. Please don’t do this to something or someone that would get upset at your suddenly turning them upside down. Assuming that you’ve turned an object upside down, look at it. Do you still know what the object is? I’d bet that you do.

But why would you? If you were used to seeing it right-side up, presumably you should be baffled at what the object is, now that you’ve turned it upside down. It doesn’t look like it did a moment ago. A moment ago, the bottom was, well, on the bottom. The top was on the top. Now, the bottom is on the top, and the top is on the bottom. I dare say that you should be completely puzzled about the object. It is unrecognizable now that it has been flipped over.

I’m guessing that you are puzzled that I would even suggest that you should be puzzled. Of course, you recognize what the object is. No big deal. It seems silly perhaps to assert that the mere act of turning the object upside down should impact your ability to recognize the object. You might insist that the object is still the same object that it was a moment ago. No change has occurred. It is simply reoriented.

Not so fast. Your ability as a grown adult is helping you quite a bit on this seemingly innocuous task. For you, it has been years upon years of cognitive maturation that makes things so easy to perceive an object when reoriented.

I could get you to falter somewhat by showing you an object that I had hidden behind my back and suddenly showed it to you, only showing it to you while it is being held upside down. Without first my showing it to you in a right-side up posture, the odds are that it would take you a few moments to figure out what the upside-down object was.

A Father’s Story About Reorienting Objects

This discussion causes me to hark back to when my daughter was quite young.

She had a favorite doll that had a big grin on the doll, permanently in place. There were a series of small buttons that shaped the mouth and it was curved in a manner that made it look like a smiley face type of grin. When you looked at the doll, you almost instinctively would react to the wide smile and it would spark you to smile too. She really liked the doll, it was her favorite toy and not to be trifled with.

One day, we were sitting at the dinner table and I opted to turn the doll upside down. I asked my daughter whether the doll was smiling or whether the doll was frowning. Though I realize you cannot at this moment see the doll that I am referring to, I’m sure you realize that the doll was still smiling, but when the doll was turned upside down, the smile would be upside down and resemble a frown.

My daughter said that the doll was sad, it was frowning.

I turned the doll right-side up.

And now what is the expression, I asked my daughter.

The doll is smiling again, she said.

I explained that the doll had always been smiling, even when turned upside down. This got a look of puzzlement on my daughter’s face. By the way, I was potentially on the verge of trifling with her favored toy, so I assure you that I carried out this activity with great respect and care.

She challenged me to turn the doll upside down again. I did so.

My daughter stood-up, and tried to do a handstand, flipping herself upside down. Upon quasi-doing so, she gazed at the doll, and could see that the doll was still smiling. She agreed that the doll was still smiling and retracted her earlier indication that it had been frowning.

I waited about a week and tried to pull this stunt again. This time, she responded instantly that the doll was still smiling, even after I had turned it upside down. She obviously had caught on.

When I tried this once again about two weeks later, she said the doll was sad. This surprised me and I wondered if she had perchance forgotten our earlier stints. When I asked her why the doll was sad, she told me it was because I keep turning her upside down and she’s not fond of my doing so. Ha! I was put into my place.

The overall point herein is that when we are very young, being able to discern objects that are upside down can be quite difficult. You’ve not yet modeled in your mind the notion of reorienting objects and being able to rotate them, doing so to then recognize them as readily. Sure, my daughter knew that the doll was still the doll, but the smile that was a frown suggested an as-yet developed sense of reorienting of objects.

Human Mental Capabilities in Reorienting Objects

What makes our learning capabilities so impressive is that you don’t just fixate on a particular object and instead you ultimately generalize to objects all told. My daughter was able to not only figure out the doll once it was upside down, she generalized this modeling to be able to then figure out other objects that were turned upside down. If I showed her an object in the right-side up position first, it was usually relatively easy for her to comprehend the object once I had turned it upside down.

Turning an object upside down, prior to presenting it, can be a bit of a challenge to your identifying an object, when presented to someone, even for adults. We are momentarily caught off-guard by the untoward orientation (assuming that you don’t normally see it upside down).

Your mind tries to examine the upside-down object and perhaps reorients the object in your mind, creating a picture in your mind, and flipping the picture to a right-side up orientation to make sense of it. You then match the reoriented mental image of the object to your stored right-side up images, and voila, you identify what the real-world object is.

Or, it could be that the mind takes a known right-side up image that’s already in its stored memory, and for which you believe it might be, and flips over the stored image that’s in your head, and then matches it to the object that you are seeing positioned as upside down, trying to decide if it is indeed that object. That’s another plausible way to do this.

There have been lots of cognitive and psychological experiments trying to figure out the mental mechanisms in the brain that aid us when dealing with the reorientation of objects. Theories abound about how are brain actually figures these things out. I’ve so far suggested or implied that we keep an image of objects in our mind. Like a picture. But, that’s hard to prove.

Maybe it is some kind of calculus in our minds and there isn’t an object image per se being used. It could be a bunch of formulas. Maybe our minds are a vast collection of geometric formulas. Or, it could be a bunch of numbers. Perhaps our minds turn everything into a kind of binary code and there aren’t any images per se in our minds (well, I suppose it could be an image represented in a binary code).

The actual brain functioning is still a mystery and other than seemingly considerable and at times clever experiments, we cannot say for absolute certainty how the brain does this for us. Efforts in neuroscience continue to push forward, trying to nail down the mechanical biological and chemical plumbing of the brain.

Range of Reorienting Objects and Their Poses

I’ve focused on the idea of completely turning an object upside down. That’s not the only way to confuse our minds about an object.

You can turn an object on its side, which might also make things hard for you to then recognize the object. Usually, we quickly guess at the object when it is only partially reoriented and can seemingly do a pretty good guess at what it is. Turning the object upside down seems to be a more extreme variant, nonetheless even some milder reorientation can still cause us to pause or maybe even misclassify the object.

If I were to take an object and slowly rotate it, the odds are that you would be able to accurately say what the object is, assuming you watched it during the rotations. When I suddenly show you an object that has already been somewhat rotated, you have no initial basis to use as an anchor, and therefore it is more challenging to figure out what the object might be.

Familiarity plays a big part of this too. If I did a series of rotations of the object, and you were staring at it, your mind seems to be able to get used to those orientations. Thus, if later on, I suddenly spring upon you that same object in a rotated posture, you are more apt to quickly know what it is, due to having seen it earlier in the rotated position.

In that sense, I can essentially train your mind about what an object looks like in a variety of orientations, making it much easier for you to later on recognize it, when it is in one of those orientations. Maybe your mind has taken snapshots of each orientation. Or, maybe your mind is able to apply some kind of mental algorithm to the orientations of that objects. Don’t know.

People that deal with a multitude of orientations of objects tend to get better and better at the object reorientation task. I used to work for a CEO that had a trick plane. He would take me up in it, usually on our lunch break at work (our office was nearby an airport). He would do barrel rolls and all kinds of tricky flight maneuvers. I learned right away to not eat lunch before we went on these flights (think about the vaunted “vomit comet”).

In any case, he was able to “see” the world around us quite well, in spite of the times when we were flying upside down. For me, the world was quite confusing looking when we were upside down. I had a difficult time with it. Then again, I’ve never been the type to enjoy those roller coaster rides that turn you upside down and try to scare the heck out of you.

AI Self-Driving Cars and Object Orientations in Street Scenes

What does this have to do with AI self-driving cars?

At the Cybernetic AI Self-Driving Car Institute, we are developing AI software for self-driving cars. One of the major concerns that we have, and the auto makers have, and tech firms have, pertains to Machine Learning or Deep Learning that we are all using today, and which tends to be ultra-brittle when it comes to objects that are reoriented.

This is bad because it means that the AI system might either not recognize an object due to the orientation of it, or the AI might misclassify an object, and end-up tragically getting the self-driving car into a precarious situation because of it.

Allow me to elaborate.

I’d like to first clarify and introduce the notion that there are varying levels of AI self-driving cars. The topmost level is considered Level 5. A Level 5 self-driving car is one that is being driven by the AI and there is no human driver involved. For the design of Level 5 self-driving cars, the auto makers are even removing the gas pedal, brake pedal, and steering wheel, since those are contraptions used by human drivers. The Level 5 self-driving car is not being driven by a human and nor is there an expectation that a human driver will be present in the self-driving car. It’s all on the shoulders of the AI to drive the car.

For self-driving cars less than a Level 5, there must be a human driver present in the car. The human driver is currently considered the responsible party for the acts of the car. The AI and the human driver are co-sharing the driving task. In spite of this co-sharing, the human is supposed to remain fully immersed into the driving task and be ready at all times to perform the driving task. I’ve repeatedly warned about the dangers of this co-sharing arrangement and predicted it will produce many untoward results.

For my overall framework about AI self-driving cars, see my article: https://aitrends.com/selfdrivingcars/framework-ai-self-driving-driverless-cars-big-picture/

For the levels of self-driving cars, see my article: https://aitrends.com/selfdrivingcars/richter-scale-levels-self-driving-cars/

For why AI Level 5 self-driving cars are like a moonshot, see my article: https://aitrends.com/selfdrivingcars/self-driving-car-mother-ai-projects-moonshot/

For the dangers of co-sharing the driving task, see my article: https://aitrends.com/selfdrivingcars/human-back-up-drivers-for-ai-self-driving-cars/

Let’s focus herein on the true Level 5 self-driving car. Much of the comments apply to the less than Level 5 self-driving cars too, but the fully autonomous AI self-driving car will receive the most attention in this discussion.

Here’s the usual steps involved in the AI driving task:

  •         Sensor data collection and interpretation
  •         Sensor fusion
  •         Virtual world model updating
  •         AI action planning
  •         Car controls command issuance

Another key aspect of AI self-driving cars is that they will be driving on our roadways in the midst of human driven cars too. There are some pundits of AI self-driving cars that continually refer to a utopian world in which there are only AI self-driving cars on the public roads. Currently there are about 250+ million conventional cars in the United States alone, and those cars are not going to magically disappear or become true Level 5 AI self-driving cars overnight.

Indeed, the use of human driven cars will last for many years, likely many decades, and the advent of AI self-driving cars will occur while there are still human driven cars on the roads. This is a crucial point since this means that the AI of self-driving cars needs to be able to contend with not just other AI self-driving cars, but also contend with human driven cars. It is easy to envision a simplistic and rather unrealistic world in which all AI self-driving cars are politely interacting with each other and being civil about roadway interactions. That’s not what is going to be happening for the foreseeable future. AI self-driving cars and human driven cars will need to be able to cope with each other.

For my article about the grand convergence that has led us to this moment in time, see: https://aitrends.com/selfdrivingcars/grand-convergence-explains-rise-self-driving-cars/

See my article about the ethical dilemmas facing AI self-driving cars: https://aitrends.com/selfdrivingcars/ethically-ambiguous-self-driving-cars/

For potential regulations about AI self-driving cars, see my article: https://aitrends.com/selfdrivingcars/assessing-federal-regulations-self-driving-cars-house-bill-passed/

For my predictions about AI self-driving cars for the 2020s, 2030s, and 2040s, see my article: https://aitrends.com/selfdrivingcars/gen-z-and-the-fate-of-ai-self-driving-cars/

Artificial Neural Networks (ANN) and Deep Neural Networks (DNN)

Returning to the topic of object orientation, let’s consider how today’s Machine Learning and Deep Learning works, along with why it is considered at times to be ultra-brittle. We’ll also mull over how this ultra-brittleness can spell sour outcomes for the emerging AI self-driving cars.

Take a look at Figure 1.

Suppose I decide to craft an Artificial Neural Network (ANN) that will aid in finding street signs, cars, and pedestrians inside of images or video streaming of a camera that is on a self-driving car. Typically, I would start by finding a large dataset of traffic setting images that I could use to train my ANN. We want this ANN to be as full-bodied as we can make it, so we’ll have a multitude of layers and compose it of a large number of artificial neurons, thus we might refer to this kind of more robust ANN as a Deep Neural Network (DNN).

Datasets Essential to Deep Learning

You might wonder how I will come upon the thousands upon thousands of images of traffic scenes. I need a rather large set of images to be able to appropriately train the DNN. I don’t want to have to go outside and start taking pictures, since it would take me a long time to do so and be costly to upload them all and store them. My best bet would be to go ahead and use datasets that already exist.

Indeed, some would say that the reason we’ve seen such great progress in the application of Deep Learning and Machine Learning is because of the efforts by others to create large-scale datasets that we can all use to do our training of the ANN or DNN. We stand on the shoulders of those that went to the trouble to put together those datasets, thanks.

This also though means there is a kind of potential vulnerability that is taking place, one that is not so obvious. If we all use the same datasets, and if those datasets have particular nuances in them, it means that we all are also going to be having a similar impact on our ANN and DNN trainings. I’ll in a moment provide you with an example involving military equipment images that can highlight this vulnerability.

For some well-known ANN/DNN training datasets, see my article: https://www.aitrends.com/selfdrivingcars/machine-learning-benchmarks-and-ai-self-driving-cars/

For my article about Deep Learning and plasticity, see: https://www.aitrends.com/selfdrivingcars/plasticity-in-deep-learning-dynamic-adaptations-for-ai-self-driving-cars/

For the vaunted one-shot Deep Learning goal, see my article: https://www.aitrends.com/selfdrivingcars/seeking-one-shot-machine-learning-the-case-of-ai-self-driving-cars/

For my article about the use of compressive sensing, see: https://www.aitrends.com/selfdrivingcars/compressive-sensing-ai-self-driving-cars/

Once I’ve got my dataset or datasets and readied my DNN to be trained, I would run the DNN over and over, trying to get it to find patterns in the images.

I might do so in a supervisory way, wherein I provide an indication of what I want it to find, such as I might give the DNN guidance toward discovering images of school buses or maybe of fire trucks or perhaps scooters. It might be that I opt to do this in an unsupervised fashion and allow the DNN to find whatever it finds, and provide an indication of what those objects that it clusters or classifies are. For example, those yellow lengthy blobs that have big tires and lots of windows are school buses.

I am hoping that the DNN is generalizing sufficiently about the objects, in the sense that if a yellow school bus is bright yellow, it is still a school bus, while if it is maybe a dull yellow due to faded paint and dirt and grime, the DNN should still be classifying it into the school bus category. I mention this because you typically do not have an easy way to make the DNN explain what it is using to find and classify the objects within the image. Instead, you merely hope and assume that if it seems to be able to find those yellow buses, it presumably is using useful criteria to do so.

There is the famous story that highlights the dangers of making this kind of assumption about the manner in which the pattern matching is taking place. The story goes that there were pictures of Russia military equipment, like tanks and cannons, and there were pictures of United States military equipment. Thousands of images that had those kinds of equipment were fed into an ANN. The ANN seemed to be able to discern between the Russian military equipment and the United States military equipment, of which we would presume that it was due to the differences in the shape and designs of their respective tanks and cannons.

Turns out that upon further inspection, the pictures of the Russian military equipment were all grainy and slightly out of focus, while the United States military equipment pictures were crisp and bright. The ANN pattern matched on the background and lighting aspects, rather than the shape of the military equipment itself. This was not readily discerned at first because the same set of images were used to train the ANN and then test it. Thus, the test set were also grainy for the Russian equipment and crisp for the U.S. equipment, misleading one into believing that the ANN was doing a generalized job of gauging the object differences, when it was not doing so.

This highlights an important aspect for those using Machine Learning and Deep Learning, namely trying to ferret out how your ANN or DNN is achieving its pattern matching. If you treat it utterly like a black box, there might be ways in which the pattern matching has landed that won’t be satisfactory for use when the ANN or DNN is used in real-world ways. You might have thought that you did a great job, but once the ANN or DNN is exposed to other images, beyond your datasets, it could be that the characteristics used to classify objects is revealed as brittle and not what you had hoped for.

Considering Deep Learning as Brittle and Ultra-Brittle

By the word “brittle” I am referring to the notion that the ANN or DNN is not doing a full-bodied kind of pattern matching and will therefore falter or fall-down on doing what you presumably want it to do. In the case of the tanks and cannons, you likely wanted the patterns to be about the shape of the tank, its turret, its muzzle, its treads, etc. Instead, the pattern matching was about the graininess of the images. That’s not going to do much good when you try to use the ANN or DNN in a real-world environment to detect whether there is a Russian tank or a United States tank ahead of you.

Let’s liken this to my point about the yellow school bus. If the ANN or DNN is pattern matching on the color of yellow, and if perchance all of most of the images in my dataset were of bright yellow school buses, it could be that the matching is being done by that bright yellow color. This means that if I think that my ANN or DNN is good to go, and it encounters a school bus that is old, faded in yellow color, and perhaps covered with grime, the ANN or DNN might declare that the object is not a school bus. A human would tell you it was a school bus, since the human presumably is looking at a variety of characteristics, including the wheels, the shape of the bus, the windows, and the color of the bus.

One of the ways in which the brittleness of the ANN or DNN can be exploited involves making use of adversarial images. The notion is to confuse or mislead the trained ANN or DNN into misclassifying an object. This might be done by a bad actor, someone hoping to cause the ANN or DNN to falter. They can take an image, make some changes to it, feed it into the ANN or DNN that you’ve crafted, and potentially get the ANN or DNN to say that the object is something other than what it is.

Perhaps one of the more famous examples of this kind of adversarial trickery involves the turtle image that an ANN or DNN was fooled into believing was actually an image of a gun. This can be done by making changes in the image of the turtle. Those changes are enough to have the ANN or DNN no longer pattern match it to being a frog and instead pattern match it to being a gun. What makes these adversarial attacks so alarming is that the turtle might still look like a turtle to the human eye, and the changes made to fool the ANN or DNN are at the pixel level, being so small that the human eye doesn’t readily see the difference.

One of the more startling examples of this adversarial trickery involved a one-pixel change that caused an apparent image of a dog to be classified by a DNN as a cat, which goes to show how potentially brittle these systems can be. Those that study these kinds of attacks will often used “differential evolution” or DE to try and find the least amount of change that is the least apparent to humans, aiming to then fool the ANN or DNN and yet make it very hard for a human eye to realize what has been done.

These changes to images are also often referred to as adversarial perturbations.

Remember that I earlier said that by using the same datasets we are somewhat vulnerable, well, a bad actor can study those datasets too, and try to find ways to undermine or undercut an ANN or DNN that has been trained via the use of those datasets. The dataset giveth and it taketh, one might say. By having large-scale datasets readily available, it means the good actors can more readily develop their ANN or DNN, but it also means that the bad actors can try to figure out ways to subvert those good guy ANN and DNN’s, doing so by discovering devious adversarial perturbations.

Not all adversarial perturbations need to be conniving, and we might use the adversarial gambit for good purposes too. When you are testing the outcome of your ANN or DNN training, it would be wise to try and do some adversarial perturbations to see what you can find, meaning that you are trying to use this technique to detect your own brittleness. By doing so, hopefully you will be able to then try to shore-up that brittleness. Might as well use the attack for the purposes of discovery and mitigation.

For my article about security aspects of AI self-driving cars, see: https://www.aitrends.com/ai-insider/ai-deep-learning-backdoor-security-holes-self-driving-cars-detection-prevention/

For AI brittleness aspects, see my article: https://www.aitrends.com/ai-insider/goto-fail-and-ai-brittleness-the-case-of-ai-self-driving-cars/

For ensemble Machine Learning, see my article: https://www.aitrends.com/selfdrivingcars/ensemble-machine-learning-for-ai-self-driving-cars/

For my article about Federated Machine Learning, see: https://www.aitrends.com/selfdrivingcars/federated-machine-learning-for-ai-self-driving-cars/

For explanation-AI, see my article: https://www.aitrends.com/selfdrivingcars/explanation-ai-machine-learning-for-ai-self-driving-cars/

I’ve so far offered the notion that the images might differ by the color of the object, such as the variants of yellow for a school bus. A school bus also has a number of wheels and tires, which are somewhat large, relative to smaller vehicles. A scooter only has two wheels and those tires are quite a bit smaller than a buses tire.

Imagine looking at image after image of school buses and trying to figure out what features allow them to formulate in your mind that they are school buses. You want to discover a wide enough set of criteria that it is not going to be brittle, yet you also don’t want to be overly broad that you then start classifying say trucks as school buses simply because both of those types of transport have larger tires.

Let’s add a twist to this. I told you about my daughter and her doll, involving my flipping the doll upside down and asking my daughter whether she could discern if the doll was smiling or frowning. I was not changing any of the actual features of the doll. It was still the same doll, but it was reoriented. That’s the only change I made.

Suppose we trained an ANN or DNN with thousands upon thousands of images of yellow buses. The odds are that the pictures of these yellow buses are primarily all of the same overall orientation, namely driving along on a flat road or maybe in a parked spot, sitting perfectly upright. The bus is right-side up.

You probably would assume that the ANN or DNN is pattern matching in such a manner that it doesn’t matter the nature of the orientation of the bus. You would take it for granted that the ANN or DNN “must” realize that the orientation doesn’t matter, a bus is still a bus, regardless at what angle and too when upside down.

If we were to tilt the bus, a human would likely still be able to tell you that it is a school bus. I could probably turn the bus completely upside down, if I could do so, and you’d still be able to discern that it is a school bus. I remember one day I was driving along and drove past an accident scene involving a car that had completely flipped upside down and was sitting at the side of the road. I marveled at seeing a car that was upside down. Notice that I could instantly detect it was a car, there was no confusion in my mind that it was anything other than a car, in spite of the fact that it was entirely upside down.

Fascinating Study of Poses Problems in Machine Learning

A fascinating new study by researchers at Auburn University and Adobe provides a handy warning that orientation should not be taken for granted when training your Deep Learning or Machine Learning system. Researchers Michael Alcorn, Qi Li, Zhitao Gong, Chengfei Wang, Long Mai, Wei-Shinn Ku, and Anh Nguyen investigated the vulnerability of DNN’s, doing so by using adversarial techniques, primarily involving rotating or reorienting objects in images. These mainly were DNN’s that had been trained on rather popular datasets, such as ImageNet and MS COCO. Their study can be found here: https://arxiv.org/pdf/1811.11553.pdf

One aspect about the rotation or reorienting of an object that you might have noticed herein is that I’ve been suggesting that the objects are 2D and you are merely tilting them or putting them upside down. Given that for most real-world objects like school buses and cars, they are 3D objects, you can do the rotations or reorienting in three dimensions, altering the yaw, pitch, and roll of the object.

In the research study done at Auburn University and with Adobe, the researchers opted to try coming up with rather convincing looking Adversarial Examples (AX), contriving them to be Outside-of-the-Distribution (OoD) of the training datasets.

For example, using a Photoshop-like technique, they took an image of a yellow school bus and tilted it a few degrees, and went further to conjure up an image of the bus turned on its side. These were made to look like the images in the datasets, including having a background of a road that other school buses in the dataset were also shown upon. This helps to make these adversarial perturbations be focused more so on the object of interest, the school bus in this case, and not have the ANN or DNN hopefully be getting distracted by the background of the image as a giveaway.

To the human eye, these adversarial changes are blatantly obvious.

There wasn’t an effort to hide the perturbations by infusing them at a pixel level. You can look at a picture and immediately discern that there is a school bus in the picture, though you might certainly wonder why the school bus is at a tilt. It wasn’t bizarre pe se in that some of the reoriented images were plausible. A yellow school bus laying on its side, on a road, well, it could have gotten into an accident and ended-up in that position.

Some of the images might be questioned, like a fire truck that seems to be flying in the air, but I would also bet that if you had a fire truck that went off a bridge or a ramp, you’d be able to get the same kind of reorientation.

For the school bus, some of the reorientations caused the ANN or DNN to report that it was a garbage truck, or that it was a punching bag, or that it was a snowplow. The punching bag classification seems to make sense in that the yellow bus was dangling as though it was being held by its tailpipe, and since it is yellow, it might seem characteristic of a yellow punching bag that is hanging from a ceiling and ready to be punched. I don’t know for sure that this is the criteria used by the ANN or DNN, but it seems like a reasonable guess as based on the misclassification.

Of the objects that they decided to convert from their normal or canonical poses in the images, and reoriented to a different pose stance, they were able to get the selected DNN’s to do a misclassification 97% of the time. You might assume that this only happens when the pose is radically altered. You’d be wrong. They tried various pose changes and seemed to find that with just an approximate 10% yaw change, an 8% pitch change, or a 9% roll change, it was enough to fool the DNN.

You might also be thinking that this reorientation only causes a misclassification about school buses, and maybe it doesn’t apply to other kinds of objects. Objects they studied included a school bus, park bench, bald eagle, beach wagon, tiger cat, German shepherd, motor scooter, jean, street sign, moving van, umbrella, police van, and a trailer truck.

That’s enough of a variety that I think we can reasonably suggest that it showcases a diversity of objects and therefore is generalizable as a potential concern.

Variant Poses Suggest Ultra-Brittleness

Many people refer to today’s Machine Learning and Deep Learning as brittle. I’ll go even further and claim that it is ultra-brittle. I do so to emphasize the dangers we face by today’s ANN and DNN applications. Not only are they brittle with respect to the feature’s detection of objects, such as a bright yellow versus a faded yellow, they are brittle when you simply rotate or reorient an object. That’s why I am going to call this as being ultra-brittle.

If today’s ANN and DNN could deal with most rotations and reorientations, being able to still do a decent job of classifying objects, and they were only confounded by extraordinary poses, I would likely backdown from saying they were ultra-brittle and settle on brittle. The part that catches your attention and your throat is that it doesn’t take much of a perturbation to get the usual ANN or DNN to do a misclassification.

In the real-world, when an AI self-driving car is zooming along at 80 miles per hour, you certainly don’t want the on-board AI and ANN or DNN to misclassify objects due to their orientation.

I remember one harrowing time that I was driving my car and another car, going in the opposing direction, came across a tilted median that was intended to protect the separated directions of traffic. The car was on an upper street and I was on a lower street.

I don’t know whether the driver was drunk or maybe had fallen asleep, but in any case, he dove down toward the lower street. His car was at quite an obtuse angle.

What would an AI self-driving car have determined? Suppose the sensors detected the object but somehow gave it a more harmless classification such as naming it a wild animal, or a tumble weed? If so, the AI action planner might decide that there is no overt threat and not try to guide the self-driving car away, and instead assume that it would be safer to proceed ahead and ram the object, like ramming a deer that has suddenly appeared in the roadway.

I realize that some might shirk off the orientation aspects by suggesting that you are rarely going to see a school bus at an odd angle, or a fire truck, or anything else. I’m not so convinced. If we tried to come up with examples of reoriented objects in real-world settings, I’m betting we could readily identify numerous realistic situations. And, if we are going to ultimately have millions of AI self-driving cars on the roadways, the odds of that many self-driving cars eventually encountering “odd” poses is going to be relatively high.

For my article about safety and AI, see: https://www.aitrends.com/selfdrivingcars/safety-and-ai-self-driving-cars-world-safety-summit-on-autonomous-tech/

For the Boeing 737 situation, see my article: https://www.aitrends.com/ai-insider/boeing-737-max-8-and-lessons-for-ai-the-case-of-ai-self-driving-cars/

For the linear non-threshold aspects, see: https://www.aitrends.com/ai-insider/linear-no-threshold-lnt-and-the-lives-saved-lost-debate-of-ai-self-driving-cars/

For my article containing my Top 10 predications about AI self-driving cars, see: https://www.aitrends.com/selfdrivingcars/top-10-ai-trends-insider-predictions-about-ai-and-ai-self-driving-cars-for-2019/

What To Do About the Poses Problem

There are several ways we can gradually deal with this issue of the poses problem.

They include:

  •         Improve the ANN or DNN algorithms being used
  •         Increasing the scale of the ANN or DNN used
  •         Ensure that the datasets include variant poses
  •         Use adversarial techniques to ferret out and then mitigate poses issues
  •         Improve ANN or DNN explanatory capabilities
  •         Other

Researchers should be trying to devise Deep Learning and Machine Learning algorithms that can semi-automatically try to cope with the poses problem. This might involve the ANN or DNN itself opting to rotate or reorient objects, even though the reorientation wasn’t fed into the ANN or DNN via the training dataset. You might liken this to how humans in our minds seem to be able to do rotations of objects, even though we might not have the object in front of us in a rotated position.

If you think that the solution should focus more so on the dataset rather than the ANN or DNN itself, presumably we can try to include more variants of poses of objects into a dataset.

This is not straightforward, unfortunately.

It seems fair to assume that you are not likely to get actual pictures of those objects in a variety of orientations, naturally, and so you’d have to synthesize it. The synthesis itself will need to be convincing, else the images will be tagged by the ANN or DNN simply due to some other factor, akin to my example earlier about the grainy nature of the military equipment images.

Also, keep in mind that you need enough of the reoriented object images to make a difference when the ANN or DNN is doing the training on the dataset. If you have a million pictures of a school bus in a right-side up pose and have a handful of the bus in a tilted posture, the odds are that the pattern matching is going to overlook or ignore or cast aside as noise the tilted postures. This takes us back to the one-shot learning problem too.

You could be tempted to suggest that the dataset maybe should have many of the tilted poses, perhaps more so than the number of poses in a right-side up position. Well, this could be undesirable too. The pattern matching might become reliant on the tilted postures and not be able to recognize sufficiently when the object is in its normal or canonical position.

Darned if you do, darned if you don’t.

The Key 4 A’s of Datasets for Deep Learning

When we put together our datasets, we tend to think of the mixture in the following way:

  •         Anticipated poses
  •         Adaptation poses
  •         Aberration poses
  •         Adversarial poses

It’s the 4 A’s of poses or orientations.

We want to have some portion of the dataset with the anticipated poses, which are usually the right-side up or canonical orientations.

We want to have some portion of the dataset with the adaptation poses, namely postures that you could reasonably expect to occur from time-to-time in the real-world. It’s not the norm, but nor is it something that is extraordinary or unheard of in terms of orientation.

We want to ensure that there are a sufficient number of aberrations poses, entailing orientations that are quite rare and seemingly unlikely.

And we want to have some inclusion of adversarial poses that are let’s say concocted and would not seem to ever happen naturally, but for which we want to use so that if someone is determined to attack the ANN or DNN, it has already encountered those orientations. Note this is not the pixel-level kind of attacks preparation, which is handled in other ways.

You need to be reviewing your datasets to ascertain what mix you have of the 4 A’s. Is it appropriate for what you are trying to achieve with your ANN or DNN? Does the ANN or DNN have enough sensitivity to pick-up on the variants? And so on.

Conclusion

When I was a child, I went to an amusement park that had one of those mirrored mazes, and it included some mirrors that were able to cause you to see things upside down. I remember how I stumbled through the maze, quite disoriented.

A buddy of mine went into it over and over, spending all of his allowance to go repeatedly into the mirror maze. He eventually could not only walk through the maze without any difficulty, he could run throughout the maze and not collide or trip at all. His repeated “training” allowed him to eventually master the reorientation dissonance.

It seems that we need to try and make sure that today’s Machine Learning and Deep Learning gets beyond the existing ultra-brittleness, especially regarding the poses or orientation of objects. For most people, they would be dumbfounded to find out that the AI system can be readily fooled or confused by merely reorienting or tilting an object.

Those of us in AI know that the so-called “object recognition” that today’s ANN and DNN are doing is not anything close to what humans are able to do in terms of object recognition.

Contemporary automated systems are still rudimentary. This could be an impediment to the advent of AI self-driving cars. Would we want AI self-driving cars to be on our roadways and yet their AI can become intentionally or unintentionally muddled about a driving situation due to the orientation of nearby objects? I think that’s not going to fly. The objects orientation poses problem is real and needs to be dealt with for real-world applications.

Copyright 2019 Dr. Lance Eliot

This content is originally posted on AI Trends.