Street Scene Free-Space Detection for Self-Driving Cars: The Road Ahead


By Dr. Lance B. Eliot, the AI Trends Insider.

Where is the road?

I was driving up in the mountains and gradually making my way to a hidden valley that I was told would be a remarkable sight to see. The mountains had the usual curving roads, going back and forth in a seesaw way. I was aware that some locals liked to really gun their engines on these mountainous roads, since they traveled them all the time and had become accustomed to knowing when the curve would switch one way or another. It was hard for me to tell that once I came out of a sharp left curve that I might suddenly need to enter into a sharp right curve. Coming around each of the tight bends, you couldn’t see anything until you actually made the bend itself.

Admittedly, I was driving pretty slowly and carefully. I did not want to lose my life on some mountainous road. Even in spite of the fact that the road was leading me to a wondrous hidden valley. It was actually a Boy Scout campsite that few knew about other than the Boy Scout troops in the area. The location was ideal for scouting purposes. We could have the Boy Scouts seemingly be in the middle of nowhere and use their wilderness survival skills, and yet the camp was only about an hour outside of the highly populated city environment of Los Angeles.

For a moment, go ahead and close your eyes and imagine driving on a road that you could not see more than say 50 feet up ahead. You are approaching a tight curve. You can only see the road that is directly ahead of you. After it curves around the bend, you cannot see at all what happens to the road, not at least until you are fully committed into the bend and already have made most of the curve.

You are going to likely assume that the roadway past where you can see is just fine. In other words, you have to make the assumption that the road continues naturally after the bend. It is surface that is drivable. It is an intact road. The road is marked and properly maintained. Your car will be able to flow right onto that road portion. These are the aspects that you take for granted.

Suppose though that as you make the bend, you suddenly encounter a huge boulder which has fallen onto the roadway. It fell right into the worst possible location. It is sitting just after the bend. As you are making the curve before where the curve bends, you cannot see that a boulder is there. You will only see the boulder in the moments just as you make the bend. At that point, depending upon your speed, you might have maybe a split-second to make a decision as to what to do. If the boulder occupies only your lane, maybe you can swing wide to avoid it. But, if you swing wide, maybe a car coming in the opposite direction will be in that lane and the two of you will collide head-on.

Suppose you anticipate that you would simply hit your brakes and stop before striking the boulder. That’s handy, but let’s walk through what needs to happen and the time it takes. Think about the amount of time needed to see the boulder, recognize it as a boulder, mentally calculate that it is obstructing your path, mentally calculate that there is no other recourse other than to hit the brakes, mentally command your foot to come off the accelerator pedal so you can switch it to the brakes, do so and have your foot them slam down on the brakes. How long does that take to do? If it takes more than a split-second, which it most likely does (see my column on human factors for self-driving cars), you will have already rammed into the boulder at full speed since the amount of time you had to do all that was only a split-second.

You can likely understand why I was driving slowly and carefully. I was trying to avoid a situation whereby I encountered a problem up ahead of me that I could not foresee, and for which the faster I was going would reduce inch by inch, and second by second my options for taking any needed evasive action. Of course, it could be something other than a large boulder. A large boulder is probably nifty as something that is awry simply because it would be easily identified and instantly recognizable. Also, it would be stationary, and so I could at least have a chance of dodging it.

Suppose instead it was a moving object. There were lots of posted signs on this mountainous road that showed the classic deer crossing symbol. If I came around a bend, and if by bad and dumb luck a deer was standing in the middle of the lane, what would I do then? The deer has a chance of moving. If I see the deer and it sees me, all happening in the same instant of time, the deer might make a choice of moving say to the right. If I had already started to steer right of the deer, I would now by on a collision path with it. So, if I had stayed straight I might have avoided it, but because it moved to the right, and because I happened to also opt to steer to the right, we now are once again going to hit each other. The boulder at least would not be playing games with me, it would just be staking out its territory and sticking with it.

Alright, I think you get the idea that when you cannot see the road ahead, you are in a perilous situation. I’ve indicated that once you can see the road, there might be something amiss. Each of the examples so far were of objects that were occupying the road. This led you to have to consider ways in which to do object recognition and ascertain what to do due to the object blocking your path. The object might be very large, such as the boulder, or it might be relatively smaller, such as the deer. The object might be stationary and fixed in place, or it might be an object that readily moves. The moving object might have a clear-cut trajectory, let’s say the boulder was rolling down the hill and it was unlikely to change the path of the rolling, or it might have an unknown trajectory and be able to shift which way it goes, such as a deer that might go to the right and then suddenly opt to change direction and go back toward the left.

Let’s make one more variation of this roadway dilemma. Suppose the roadway doesn’t continue at all. Is this a Twilight Zone episode, you ask? No, really, I am serious, suppose the roadway is just not there. I could have been driving around a bend, and this was during the winter months, so the mountain itself was very wet and muddy, and suddenly not have a road left to drive on. As background for you, there had been instances of the mountain slopes deciding to slide and take out entire passages of roadway. Essentially, I could be driving directly onto and over a cliff, if the road ahead had been obliterated due to a mountain slide.

You might say that certainly there would be signs warning of this, and some kind of roadway cones or other indications that I was about to reach a gap in the roadway. Not if the slide had just happened and there was yet to be anyone that had come across the roadway. I could be the “lucky” first driver to discover that there was no more roadway ahead. There didn’t need to be any warning. The roadway leading up to the curve could look perfectly fine. The roadway of the curve could be itself in pristine condition. It could be that just as you round the bend, there is a newly formed cliff. Would you be able to react in time? Would you be so shocked that you would not even be able to react and just plummet off the road?

Fortunately, I did not encounter any such breaks in the roadway. I am still alive to be able to write this sentence, thankfully.

What does this have to do with self-driving cars?

At the Cybernetic Self-Driving Car Institute, we are developing advanced AI for street scene analyses and doing what is referred to as free-space detection. This is a crucial aspect of any self-driving car, and we all need to find ways to improve how it is done. The better the free-space detection, the better things will be for the safe and sound operation by the AI of the self-driving car.

Here’s how free-space detection works.

The front-facing sensor cameras on the self-driving car take images of whatever is ahead of the vehicle. Those images are analyzed by the vision system to find aspects such as where is the roadway, where are objects, and the like. These findings are being done as part of the sensor fusion (see my column on sensor fusion for self-driving cars). The sensor fusion feeds into the AI system. The AI system receives the various roadway recognition and object recognition indications from the vision system, and updates a virtual model of the surrounding world that the self-driving car is immersed within. The AI system then needs to interpret the indicated roadway and objects to decide which direction the car should go, what speed it should go, whether the car should apply brakes, and so on.

If you are driving on an open highway, this image analysis should hopefully reveal that for example right now you have 25 feet of open roadway space between you and the car directly ahead of you. It might also have found free-space that is to your right, perhaps it is the shoulder of the highway. It might also have indicated free-space to your left, assuming you are in a two-lane road and you are in the rightmost slow lane, and there isn’t a car immediately to your left and in the fast lane.

All of this free-space detection is vital to know about. If the self-driving car needs to suddenly swerve to avoid something, it needs to know if there is free-space to its left, or to its right. It needs to know how much free-space is directly ahead of it. You, as a human driver, are continually doing the same thing. Your eyes are scanning the roadway, and feeding the images to your brain. Your brain is calculating where the free-space exists. Do you have room to move up closer to the car ahead of you? Can you swing into the lane to your left? Can you go onto the shoulder of the highway if an emergency dictates doing so?

We take all of this for granted most of the time. As humans, we are well accustomed to using our vision system, our eyes, in order to look around for free-space and for objects. During driver training for novice teenage drivers, most of them will have their head pointed directly ahead and be petrified about looking anywhere but immediately ahead of the car. They are taught to be looking left and right, scanning the horizon, and be able to readily know what is around them. I’ve seen some student drivers that hold their head in a rigid straight ahead and manner, and they are trying to get their eyeballs to shift back-and-forth, rather than moving their head back-and-forth.

Part of the reason you need to do continual scanning is that as you move, the scene around you is changing. You cannot rely upon what you saw three minutes ago, that’s old news. You cannot even rely on what you saw ten seconds ago. If you are driving at 65 miles per hour, you are moving forward at about 95 feet per second. This means that you need to be looking around you for free-space and be able to tell where free-space exists as you are zooming through the free-space that you moments earlier detected and decided was safe to drive through.

Imagine that you are driving on a road and you see up ahead that it is wide open. Not a car in sight. Not a boulder or deer in sight. Your eyes are seeing a lot of free-space. This lets your brain relax somewhat, because you are driving along at 95 feet per second and yet you know that there is plenty of free-space all around you. Nothing to worry about. But, suddenly, seemingly out of nowhere, a car that is going perpendicular to your road, opts to drive across your road. All of a sudden, your free-space is now being occupied.

You need to now calculate mentally the path of the car that is going to cross your path. Will it continue on its path and you’ll end-up going behind it, or will you pass in front of it and safely go past? Your mind also realizes that there is free-space on the far side of that car. In other words, the car is blocking part of your view now of the road further ahead of you, but you know that there is road there, because you had seen it moments earlier, prior to the car now obstructing part of your view.

I mention this because when you are driving in a city environment, trying to visually identify free-space gets hard. There are numerous objects that can block your view. You can detect the free-space that leads up to those objects, and you can sometimes see over the object to see free-space on the other side, but part of your ability to see the free-space is being blocked visually. There might be a big truck parked on the street that is a doing a delivery and you cannot see well past the truck. There might be pedestrians walking across the street and you cannot see exactly the free-space beyond them. And so on.

For the self-driving car, this is equally an issue. It gets visual images streaming into the cameras and needs to make guesses about where the free-space exists. Even if the free-space is not shown in the image, there is a chance of free-space being there, beyond an object that is blocking part of the image that shows the roadway. As objects come and go in front of you, the system needs to keep track of where free-space is, and dynamically keep up with the changing scenery.

This is an imperfect world and so the vision system of the self-driving car is having to attach probabilities to what it is finding. Is this patch of roadway free-space or not? Rather than saying yes or no, the system might estimate that with a high probability it is free-space. Once you the car moves a few more feet, the next image flowing in might confirm the existence of the free-space, or it might lead to reducing the probability that there is free-space there due to the image analyses being done and an updated indication that there isn’t free-space where the system once thought there was.

These image recognition aspects can be computationally expensive, meaning that it takes a lot of computer processing to do. As the processors on self-driving cars get faster and less expensive, the image recognition that can be done is improved, in the sense of it can be done faster and done to a more exhaustive and detailed manner. Likewise, the cameras are getting better such that they have higher resolution and can grab sharper images and longer distance seeing images. All of this will generally improve the vision system capabilities of self-driving cars.

What has really helped too is the advent of stereo-cameras. By using stereo-cameras, the vision system on the self-driving car can gauge stereoscopic depth perception. Here’s what I mean. If you look at a normal picture that you take with your smartphone, can you look at the picture and be able to know the depth of objects and the scene in the picture? Not really. You have a 2D or flat picture, and when you look at it, you need to mentally tell yourself that those people in the picture are maybe ten feet away from the camera, but how do you know?  You know because of their relative size in the picture and their relative size and position in comparison to other objects in the picture. I could easily fool you by making a setting that you think involves a certain depth but I’ve optically fooled you.

A self-driving car is driving in a 3D world. The images from a normal camera are 2D. Yet, somehow, the self-driving car has to identify and interpret aspects of a 3D world in order to be driving safely in a 3D setting. The use of stereo-cameras allows the vision system to determine the geometry and the 3D aspects by doing various vision processing tricks, including using the stereo disparity signal. Indeed, one of the popular algorithms for doing this is the Stixel Disparity model.

Recent advances are improving how we do the vision processing. For example, by using a Fully Convolutional Network (FCN), recent efforts show that it is feasible to do self-supervised analyses of vision training sets, rather than relying upon manually annotated training sets.  This allows the vision system to be more adaptive to dynamic street scenes that it has not encountered previously. It is a probabilistic framework, and uses Fourier transformation to aid in calculating the binocular disparity of locations.

I’ll return to my earlier question though, namely, where is the road?

The self-driving car relies upon the vision system sensors and sensor fusion to inform it about where the road is. The road though is only being understood in a probabilities fashion. The road is there with a certain amount of confidence, or not there with a certain amount of confidence. To augment what the vision system is reporting, a good sensor fusion and AI system tries to compare with other sensory data.

If the vision system is saying that the roadway is blocked up ahead, and there is 100 feet of free-space in front of the object, and then beyond the object there is 200 feet of free-space, the AI should be asking the radar what it has found. Does the radar also detect the object that is at supposedly a position of 100 feet ahead of the existing free-space that is directly in front of the self-driving car?

This can then get tricky because suppose the vision system says there is an object at 100 feet ahead, but the radar says it detects no object up ahead. Is that because the object is beyond the range of the radar? It is because the image system has gotten optically confused by something else in the scene up ahead?

For example, one famous instance was a self-driving car that reportedly was scanning images of the scene ahead, and there was a billboard that had a car advertisement shown on the billboard, displaying pictorially a picture of a car. The image analysis system interpreted the picture of the car to be an actual car. It then conveyed to the AI system that there was a car up ahead, when in fact it was merely a picture of a car, as displayed on the billboard.

The AI system needs to be gauging whether the info coming from the vision system might have false positives, such as the indication that a car exists ahead when it is only a picture of a car on a billboard, or when there might be false negatives. A false negative would be when the vision system does not catch that there is an object ahead, and yet there really is an object ahead. This is also true of free-space, in that the vision system might report that there is free-space when it isn’t actually there (a false positive) or report that there is no free-space when it is indeed there (a false negative).

You can imagine the dangers that loom if the AI is told or believes that free-space exists when it does not actually exist. The AI might decide to move into the next lane, since it was informed by the vision sensor that there was free-space there (meaning no object there), and then ram into a car that was in that free-space, but that the AI didn’t know was there. This is similar to human drivers that don’t look carefully at their side view mirrors or do but nonetheless cannot see what is in their blind spots. If they mentally believe there is nothing in their blind spot, and then make a lane change, they can ram into another car that was actually in their blind spot. By the way, I see this near-miss situation happen every morning during my early morning freeway commute.

As you can guess, the street scene analysis and detection of free-space is a cornerstone for all self-driving cars. In a dynamic manner, as the car is being driven by the AI, the sensors are feeding valuable information about where free-space exists, and where it does not exist. The AI takes this into account as it moves the vehicle from place to place. The free-space detection is especially hard in city environments. It becomes even more challenging in adverse weather conditions, such as heavy rains or snow, since the vision system is going to be partially occluded or being getting images that are slurred or smeared. The better job that the free-space detection can do, the safer the driving by the self-driving car. The poorer the job of the free-space detection, the greater the chances of the self-driving car going the wrong way, hitting objects, or driving onto space that is not viable free-space. Free-space, it’s the rad thing.

This content is originally posted on AI Trends.