By Dr. Lance B. Eliot, the AI Trends Insider
There are essentially three main system functions of self-driving cars: (1) car sensor-related system functions, (2) car processing related functions that we tend to consider the AI of the self-driving car, and (3) car control related functions that operate the accelerator, the brakes, the steering wheel, etc.
I am going to discuss today mainly the sensors and an important aspect of sensory data usage that is called sensor fusion. That being said, it is crucial to realize that all three of these main systems must work in coordination with each other. If you have the best sensors, but the AI and the control systems are wimpy then you won’t have a good self-driving car. If you have lousy sensors and yet have strong AI and controls capabilities, you will once again have likely problems because without good sensors the car won’t know what exists in the outside world as it drives and could ram into things.
As the Executive Director of the Cybernetic Self-Driving Car Institute, I am particularly interested in sensor fusion, and so I thought it might be handy to offer some insights on that particular topic. But, as noted above, keep in mind that the other systems and their functions are equally important to having a viable self-driving car.
A self-driving car needs to have sensors that detect the world, and there needs to be subsystems focused on dealing with the sensors and sensory data being collected. These can be sensors such as cameras that collect visual images, radar that makes use of radio waves to detect objects, LIDAR (see my column about LIDAR) that makes use of laser light waves, ultrasonic sensors that use sound waves, and so on. There are passive sensors, such as the camera that merely accepts light into it and therefore receives images, and there are active sensors such as radar that sends out an electromagnetic radio wave and then receives back the bounce to then figure out whether an object was detected.
The sensory data needs to be put together in some fashion, referred to as sensor fusion, in order to make sense of the sensory data. Usually, the overarching AI processing system is maintaining a virtual model of the world within which the car is driving, and the model is updated by new data arriving from the sensors. As a result of the sensor fusion, the AI then needs to decide what actions should be undertaken, and then emits appropriate commands to the controls of the car to carry out those actions.
An apt analogy would be to the way that the human body works. Your eyes are receiving visual images. Those images are conveyed to your brain. Your brain has a mental model of the world around you. Based on the images, the brain updates the model. Using the model, the brain ascertains what your body should do next. Let’s suppose you are in Barcelona and at the running of the bulls. You are standing on a street corner waiting for the madcap bulls to arrive. Your brain has a mental model of the streets and the people around you. You eyes suddenly see the bulls charging toward you. The images of the charging bulls stream into your brain. Your brain updates the mental model of the situation, and determines that you ought to start running like mad. The brain then commands your legs to engage and start running. It directs them to run away from the bulls and down the street to try and escape them.
In this analogy, we had a sensor, your eyes. It was collecting visual images. The eyes themselves don’t do much with the images per se. They are predominantly dealing with getting the images. There is some fault tolerance capabilities of your eyeballs, in that even if your eye is partially covered up you can still use it to capture images. Furthermore, if you eye gets occluded, let’s say that you get something in your eye like a piece of lint, the eye is able to continue functioning but also realizes that something is amiss. This is transmitted to the brain. There isn’t much image processing per se in the eyeball itself in terms of making sense of the image. The eyeball is not (at least as we know) figuring out that the creature running toward you is a bull. It is up the brain to take the raw images fed by the eyeballs and try to figure out what the image consists of and its significance.
Your brain keeps a context of the existing situation around you. Standing there in Barcelona, your brain knows that your body is at the running of the bulls. It has a mental model that then can make good sense of the image of the charging bull, because your brain realizes that a charging bull is a likely scenario in this present world situation you find yourself in. Suppose that instead you were standing in New York Times Square. Your mental model would not likely include the chances of a charging bull coming at you. Your brain could still nonetheless process the aspect that a charging bull was coming at you, but it would not tend to fit into the mental model of the moment. You might be expecting a taxi to run at you, or maybe a nut wearing a spider-man outfit, but not probably a wild charging bull.
Humans have multiple senses. You take the sense of sight and your brain uses it to inform the mental model and make decisions. You also have the sense of touch, and ability to detect odors, the sense of taste, and the sense of hearing. There are various sensory devices on your body that pertain to these aspects. Your ears are your sensory devices for hearing of sounds. Those sounds are fed into the brain. The brain tries to figure out what those sounds mean. In the case of being in Barcelona, you might have heard the pounding hoofs of the bulls, prior to actually seeing the bulls coming around the corner. Your brain would have updated the mental model that the bulls are nearby. It might have then commanded your legs to start running. Or, it might opt to wait and determine whether your eyes can confirm that the bulls are charging at you. It might want to have a secondary set of sensory devices to reaffirm what the other sensory device reported.
On self-driving cars, there can be just one type of sensory device, let’s say radar. This though would mean that the self-driving car has only one type of way of sensing the world. It would be akin to your having only your eyes and not having other senses like your ears. Thus, there can be multiple types of sensory devices on a self-driving cars, such as radar, LIDAR, and ultrasonic.
There are individual sensors and potentially multiples of those by type. For example, a Tesla might come equipped with one radar unit, hidden under the front grill, and then six ultrasonic units, dispersed around the car and mounted either on the outside of the skin of the car or just inside of it. Each of these types of sensors has a somewhat different purpose. Just as your eyes differ from your ears, so do these sensor types.
Radar for example is used for distance detection and speed of objects, typically at a range of around 500 feet or so. Ultrasonic sensors are usually used for very near distance detection, often within about 3 to 6 feet of the car. The radar would tend to be used for normal driving functions and trying to detect if there is a car ahead of the self-driving car. The ultrasonic sensors tend to be used when the self-driving car is parking, since it needs to know what is very nearby to the car, or can also be used when changing lanes while driving since it can try to detect other cars in your blind spot.
Recall that I mentioned that your eyes can do some amount of fault detection and have a range of fault tolerance. Likewise with the sensors on the self-driving car. A radar unit might realize that its electromagnetic waves are not being sent out and returned in a reliable manner. This could mean that the radar itself has a problem. Or, it could be that it is trying to detect an object beyond its normal functional range, let’s say the stated range is 500 feet and there is an object at 600 feet. The radar wave returns from the object might be very weak. As such, the radar is not sure whether the object is really there or not. There can also be ghosting which involves situations whereby a sensor believes something is there when it is not. Think about how you sometimes are in a very dark room and believe that maybe you see an image floating in the air. Is it there or does your eyeball just get somewhat confused and falsely believe an image is there? The eyeball can play tricks on us and offer stimulation to the brain that is based on spurious aspects.
For self-driving cars, there have been some researchers who have purposely shown that it is possible to spoof the sensors on the self-driving car. They created images to trick the self-driving car camera into believing that the self-driving car was in a context that it was not (imagine if you were standing in Barcelona but I held up a picture of New York Times Square, your eyeballs would convey the image of New York Times Square and your brain needs to figure out what is going on, is it a picture or have you been transported Star Trek style into New York). Researchers have spoofed the radar. You might already know that for years there have been outlawed devices that some had in their cars to fool the radar guns used by police. The device would trick the radar gun into showing a speed that was much less than the speed of the actual driving car. Sorry, those were outlawed.
A sensor can produce what is considered a false positive. This is a circumstance involving a sensor that says something is present, but it is not. Suppose the radar reports that there is a car directly ahead of you and it is stopped in the road. The AI of the self-driving car might suddenly jam on the brakes. If the camera of the self-driving car is not showing an image of a car ahead of you, this conflicts with what the radar has said. The AI needs to ascertain which is correct, the radar reporting the object, or the images that don’t show the object. Maybe the object is invisible to the camera, but visible to the radar. Or, maybe the radar is reporting a ghost and the radar should be ignored because the camera shows there is no object there. If the radar is correct and the object is there, but the camera doesn’t show it to be there, the camera would be said to be reporting a false negative. A false negative consists of a sensor saying that something is not present when it actually is there.
Any of the sensors can at any time be reporting a false positive or a false negative. It is up to the AI of the self-driving car to try and figure out which is which. This can be hard to do. The AI will typically canvas all its sensors to try and determine whether any one sense is doing false reporting. Some AI systems will judge which sensor is right by pre-determining that some of the sensors are better than the others, or it might do a voting protocol wherein if X sensors vote that something is there and Y do not then if X > Y by some majority that it will decide it is there. Another popular method is known as the Random Sample Consensus (RANSAC) approach. Risk is also used as a factor in that it might be safer to falsely come to a halt than it would be to risk ramming into an object that you weren’t sure was there but it turns out is there.
This is where sensor fusion comes to play. Sensor fusion consists of collecting together sensory data and trying to make sense of it. In some cars, like certain models of the Tesla, there is a sensor fusion between the camera and the radar, and then this is fed into the AI of the car. The AI of the car then receives a combined sensor fusion from those two units, and must combine it with other sensory data such as the ultrasonic sensors. I mention this because there is not necessarily one central place of sensor fusion in a self-driving car. There can be multiple places of sensor fusion. This can be important to note. Imagine if your brain was receiving not the raw images of the eyes and the ears, but instead some pre-processed version of what your eyes and ears had detected. The sensor fusion in-the-middle is going to be making assumptions and transformations that then the brain becomes reliant upon.
Within the self-driving car, there is a network that allows for communication among the devices in the self-driving car. The Society for Automotive Engineers (SAE) has defined a handy standard known as C.A.N. (Controller Area Network). It is a network that does not need a host computer, and instead the devices on the network can freely send messages across the network. Devices on the network are supposed to be listening for messages, meaning they are watching to see if a message has come along intended for that device. The devices are usually called Electronic Control Units (ECU) and are considered nodes in this network. This CAN is similar to the TCP-IP protocol and allows for asynchronous communications among the devices, and each message is encompassed in an envelope that indicates an ID for the message along with error correcting codes and the message itself.
The sensor fusion is often referred to as Multi-Sensor Data Fusion (MSDF). By taking the data from multiple sensors, there is a low-level analysis done to ascertain which sensors are Ok and which might be having problems. The MSDF will have a paradigm or methodology that it is using to decide which sensors are perhaps faulty and which are not. It will ultimately then send along a transformed indication of the raw sensor data and also then some kind of conclusions about the sensory data, and push that along to the brains of the self-driving car, the AI. The AI system or processing system then updates the model of the environment and must decide what to do about it at a higher-level of abstraction. The outcome is typically a command to the controls of the car, such as to speed-up, slow down, turn left, turn right, etc.
The Field-of-View (FOV) of the self-driving car is vital to what it knows about the outside world. For example, a radar unit at the front grille of the car is typically going to have a fan-like wave of radar detection, but it is only with respect to what is directly in front of the car. Objects that are at off-angles of the car might not be detected by the radar. The radar is for sure not detecting what is behind the car and nor to the side of the car in this instance. The AI system needs to realize that the info coming from the radar is only providing a FOV directly ahead of the car. It is otherwise blind to what is behind the car and to the sides of the car.
LIDAR is often used in today’s self-driving cars to create a 360-degree model of the surrounding environment. The LIDAR uses laser light pulses and often is made to rotate continuously in a 360-degree circle. By doing so, the LIDAR can provide object detection of objects completely around the car. When combined with the front-facing radar, and a front-facing camera, and ultrasonic sensors on the sides of the car, a more full-bodied world model can be constructed and maintained. You might wonder why not have a zillion such sensors on the self-driving car, which would presumably allow for an even more robust indication of the outside world. You could certainly do so, though it causes the cost of the car to rise, and the weight and size of the car to rise.
Self-driving car makers are all jockeying to figure out how many sensors, which sensors, and which combination of sensors makes sense for a self-driving car. More sensors, more data, more to process, more cost of hardware. Less sensors, less data, less to process, lower cost of hardware. As I have previously mentioned, Elon Musk of Tesla says he does not believe LIDAR is needed for self-driving cars, and so there is not LIDAR being used on Tesla’s. Is he right or is he wrong? We don’t yet know. Time will tell.
There is some point at which a self-driving car is safe or not safe, or safer versus not safer than another one. This is why I have been predicting that we are going to see a shake-up eventually in self-driving cars. Those that had chosen some combination of sensors that turns out to not be as safe are going to lose out. We don’t know what the right equation is as yet. In theory, the testing of self-driving cars on public roadways is going to reveal this, though hopefully not at the cost of the loss of human lives.
Based on the sensor data, there is usually Multi-Target Tracking (MTT) that needs to be undertaken. The raw data needs to be examined to identify features and do what is known as feature extraction. From a camera image, the sensor fusion might determine that a pedestrian is standing a few feet away from the car. If there is concern that the pedestrian might walk into the path of the car, the AI might decide to track that pedestrian. Thus, as subsequent images are captured from the camera, the pedestrian becomes a “target” object that has been deemed to be worthy of tracking. If the pedestrian seems to be about to get run over, the AI might then task the brakes to do a hard-braking action.
There is a need for the AI system to consider the sensory data in both a spatial manner and a temporal manner. Spatially, the sensor data is indicating what is presumably physically around the car. There is a car ahead, a pedestrian to the right, and a wall to the left of the car. For temporal purposes, the AI needs to realize that things are changing over time. The pedestrian has now moved from the right of the car to the left of the car. The car ahead is no longer ahead since it has pulled to the side of the road and stopped. The AI is reviewing the sensory data in real-time, as it streams into the AI system, and besides having a spatial model of the objects it must also have a temporal model. Object in position A is moving toward position B, and if so, what should the self-driving car do once the object gets to position B.
Notice that the AI therefore needs to be aware of the present situation and also predicting the future. We do this as human drivers. I see a pedestrian, let’s say a teenager on his skateboard. He’s on the sidewalk and moving fast. I anticipate that he is going to jump off the curb and possibly intersect with the path of my car. I therefore decide to swerve in-advance to my left and avoid hitting him once he makes the jump. The AI of the self-driving car would have been receiving sensor data about the teenager and would have had to make the same kinds of predictions in the model of the world that it has.
For those of you that are aware of the speed of microprocessors, you might be right now wondering how can all of this massive amount of sensory data that is pouring in each split second be getting processed in real-time and quickly enough for the self-driving car to make the timely decisions that are needed.
You are absolutely right that this needs tremendously fast processors and a lot of them too, working in parallel. Let’s trace things. There is a radar image captured, and the ECU for the radar unit does some pre-processing, which takes a split-second to do. It then sends that along the CAN to the sensor fusion. This takes time to travel on the network and be received. The sensor fusion is getting likewise data from the camera, from the ultrasonic, from the LIDAR. The sensor fusion processes all of this, which takes another split-second. The result is sent along to the AI. The AI needs to process it and update the world model. This takes time. The AI then sends a command to the controls of the car, which goes across the CAN network and takes time to happen. The controls then receive the command, determine what it says to do, and physically take action. This all takes time too.
If the processors are not fast enough, and if the lag time between the sensor data collection and the final act of telling the controls to engage is overly long, you could have a self-driving car that gets into an accident, killing someone.
Not only is it about the raw speed of the processors, but it is also what the processing itself is doing. If the AI, for example, deliberates overly long, it might have reached a correct decision but no longer have time to implement it. You’ve probably been in a car with a human driver that hesitated to try and make the change of light at an intersection. Their hesitation put them into a bind wherein they either try to rocket through the now red light or come to a screeching halt at the edge of the intersection. Either way, they have increased the risk to themselves and those around them, due to the hesitation. Self-driving cars can get themselves into exactly the same predicament.
Here’s then some of the factors about sensors and sensor fusion that need to be considered:
- Mounting Space
- Error Reporting
- Fault Tolerance
Sensor fusion is a vital aspect of self-driving cars. For those of you whot are software engineers or computer scientists, there are ample opportunities to provide new approaches and innovative methods to improving sensor fusion. Self-driving car makers know that good sensor fusion is essential to a well operating self-driving car. Most consumers have no clue that sensor fusion is taking place and nor how it is occurring. All they want to know is that the self-driving car is magical and able to safely take them from point Y to point Z. The better sensor fusion gets, the less obvious it will be, and yet the safer our self-driving cars will be.
This content is original to AI Trends.