Noise Pollution Abatement and AI Soundscape Deep Learning: The Case of AI Self-Driving Cars


By Lance Eliot, the AI Trends Insider

I was being serenaded. Standing on the street corner at New York Times Square, I was surrounded by the epic and at times overwhelming sounds of New York City (NYC). I had come to the city that never sleeps to speak at an industry conference on Artificial Intelligence (AI). Opting to walk from my hotel to the conference location, I couldn’t help but hear the nefarious noises of this hustling and bustling famous town.

Bang, bam, honk, honk, whoosh, woot, bang, and so on, there was a cacophony of noises coming at me, drowning me in a sea of diverse and often bone-rattling sounds. NYC has a lot of claim to fame, and one of those claims is that it is the noisiest city in all of North America (not just limited to the United States!). I realize that there are other cities that might dispute this claim, though I am not sure that anyone would really want to admit that their city is the grandest of them all due to loud and overbearing noises.

Environmentalists would label it as noise pollution.

Some New Yorkers are proud of the noises that emanate throughout their domain. That loud banging is coming from yet another construction project, consisting of a new skyscraper being built with steel girders and producing a commensurate volume of construction noises. It is a showcase of the growth of the area and the promise of more jobs.

The streets are filled with cars, cabs, buses, motorcyclists, bicyclists, all producing traffic noises, though admittedly the amount of horn honking is nowadays a lot less than it used to be (due to a crackdown on honking incessantly). Many New Yorkers would readily admit that their traffic situation is horrendous and there is a continual plethora of noises coming from the cars, including car exhaust noises, car doors opening and closing noises, tire screeching noises, cars bumping along on potholes noises, cars hitting each other noises, etc.

But, as the locals say, you get used to the sounds. Indeed, some of my colleagues that live in NYC have at times looked at me with a puzzled expression when I mention the noises that I hear when I stay there. The puzzlement is due to the fact that they, as locals, no longer overtly hear the sounds, having become so accustomed to the noises. Usually, when walking with them on the streets of NYC, they’ll completely ignore the noises and be chatting with me as though we are walking in a quiet wooded forest and there is nothing that makes any abrasive noises, it is as though we are immersed among the song of gentle birds in a forest or are hearing the sweet sounds of trees gently swaying in the wind.

I assure you that those sounds are not quite so sweet as imagined. You can certainly try to block out the noises of the city, either mentally ignoring them or pretending they are minimal, but nonetheless your ears are getting pounded. Sound is energy. Typically measured in Sound Pressure Levels (SPLs), those city noises are battering your ears. Often, it is customary to use dBA’s (formerly described as A-weighted decibels) as a scale for comparing different kinds of noises. The SPL dBA is a logarithmic scale and you need to carefully interpret the numbers used, realizing that as the numbers increase it is not simply a linear progression.

Let’s consider some sounds that pertain to human hearing. The sound of a pin dropping is 10 dBA. You have pretty good ears to ear that sound. Rustling leaves are typically around 20 dBA, while a babbling brook is about 40 dBA. So far, these are all relatively quiet and readily enjoyable sounds.

Your alarm clock that sharply awakens you in the mornings is likely around 80 dBA. That’s a sound that is not just jarring, it is also perhaps universally hated because of its significance (yes, a sound meaning it is time to get up and go to work, again!). The sound of a jackhammer gets you to about 110 dBA. A gun being fired is probably 160 dBA or more. Those are rather obnoxious sounds and ones that can cause either temporary damage to your ears or have permanent adverse impacts on your ears.

In a somewhat serene suburbia, the average noise level might be around 40 to 50 dBA. Time of day can make a big difference in terms of the overall noise level. There is the Day-Night average level (Ldn) and the Community Noise Equivalent Level (CNEL), used to help compare cities since these metrics encompass the variations between daytime and nighttime noise levels.

A noisy urban or city area could be 60 to possibly 80 dBA, likely being at the higher point during daytime. When you are standing on the street and listening to the city noises, they can somewhat get bundled together and you might not be able to readily distinguish any particular sound. There tends to be a somewhat steady level of ambient noises that come at you. Some of the noises are nearby, while other sounds are coming from quite a distance.

While wandering around NYC, I felt as though the skyscrapers were able to at times magnify sounds that were quite a distance away from me. Construction sounds from several blocks away were able to travel in and around the giant buildings. It was as though I might be in a canyon and have echoing sounds that made their way from miles away.

The noises seemed to be continuous. There was little if any breaks in the drumbeat of sounds. Even at night time, while in my hotel room, the street noises continued all night long. Yes, I had the windows tightly closed, including drawing rather heavy curtains. Still, in spite of those aspects at blocking out the street sounds, it was immediately apparent that I was trying to sleep in a city that doesn’t.

There are also sounds that momentarily appear and can be attention getting. For example, the sound of a siren coming from a police car, an ambulance, or a fire truck. If you aren’t used to routinely and frequently hearing those sounds, it catches you ear right away, and you look to see what is coming on. Given the large populace in NYC, the odds of hearing a siren nearly all of the time is pretty high. During my stay, there were few stretches of time for which I could not hear the wail of a siren, either one that was close to me or one that might have been several blocks away.

I remember a major construction project that was going to take place in Los Angeles and there was some vocal concern about how noises from the construction might impact a nearby school and children’s playground. At the time, I was working in that area and could see from my upper-floor office the dirt lot that the new retail space was proposed to be built upon. Just across the street there were elementary school children that would come outside for recess.

I pondered the massive amount of heavy equipment that would be rolled into the dirt lot and how much noise it would make. There was some demolition needed to remove the remnants of prior structures. The site itself would need to be graded and some excavation done. Once the actual building began getting constructed, there would be the mix of power tools, hand tools, generators, and a slew of other sounds coming from that site. Until I had seen a formal report undertaken about the dangers of those noises, I hadn’t thought about what it might do to those children and adults at the nearby school and playground.

Noises can of course harm our ears, limiting our ability to hear. In addition, noises can be distracting and dilute or undermine attention. I had wondered whether the school children would be able to concentrate on their class work if they were continually being bombarded by outdoor noises. They probably would also have a hard time hearing the teacher. Imagine trying to take notes of a lecture on math or history, and you can only hear about one of every three words being uttered by the instructor.

Noise Pollution is a Serious Issue

According to the Environmental Protection Agency (EPA), noise pollution is a serious issue in the United States. There is a somewhat defacto nationwide noise pollution policy as enacted by the Noise Control Act of 1972. There are numerous local and state ordinances and policies about noise pollution. For federal highways, the Federal Highway Administration (FHWA) promulgates highway noise pollution rules as per the Congressional Federal-Aid Highway Act of 1970.

I’ve already mentioned that noise pollution could harm your hearing and can possibly impair the cognitive aspects of learning by children, which seem sufficiently important qualms to consider, yet there are other equally serious consequences too. Some studies suggest a link between noise pollution and the potential for having heart disease, or for having other ailments like high blood pressure, can be bad for your overall health. There are likely a plethora of adverse health consequences that we can list due to noise pollution.

I’ll add that disrupted sleep is one of those, which made me somewhat dulled in my efforts during my stay there in NYC and I felt groggy at times. I don’t want to make light of the matter, and I suppose it could also be that I was out partying with conference attendees until the late hours. In any case, allow me to emphasize, noise pollution is serious stuff.

When I bring up the topic of noise pollution, the usual reaction is a kind of surprise and say-what response. People are used to discussing air pollution, which they would readily agree is bad. People are aware of the dangers of water pollution and would readily agree it is bad. Noise pollution is a topic they often are unsure about. Noises? You are worried about noises, they ask? How bad can it be?

For most people, probably there is little or no concern about noise pollution in their everyday lives. They often realize that at a rock concert they are possibly harming their ears and therefore voluntarily engaging in a noise pollution moment. Same might be the case when going to a football game or basketball game at a stadium jam packed with screaming fans. Otherwise, they might hear the occasional jackhammer or siren, and not put much thought or concern toward noise pollution.

Those New Yorkers that I walked around with to get some pizza and enjoy the sights of the town, well, they by-and-large shrugged their shoulders about the noises and the noise pollution. They would wave their hands and arms in the air and say to me, “what you gonna do?” in exasperation. To them, the noises are as expected. The noises mean that their place is vibrant and alive.

They’ve already seen many efforts to curtail traffic, usually due to air pollution concerns. They’ve seen efforts to stop construction projects, often due to over-population concerns and the city crowding that can occur. Having dealt with numerous causes and concerns, they are as much “tone deaf” to those matters as they are to the noises of the city itself. They concede there is noise pollution, yet don’t believe it is serious enough to do anything drastic and also tend to be unaware of what could be done about it anyway, other than merely bringing the place to a screeching (noisy) halt.

One aspect that many locals might not know is that there are ways to fight back against noise pollution. Typically, it involves contacting your local city noise enforcement team, consisting of government workers that can officially check on a noise polluter and do something about it, including fining the noise offender or taking other legal actions against them. You would not normally use an emergency number such as 911 to report such noise pollution instances, and instead would likely use some government reporting number such as 311.

In the case of NYC, they receive an average of around 800 reported noise complaints per day. This is likely just the tip of the iceberg in terms of how many people are genuinely frustrated and concerned about noise pollution aspects. Most people aren’t aware that you can report the matter. Plus, those that are aware might not believe it is worth the effort to complain.

A primary reason that reporting a noise pollution instance might seem not worthwhile can be due to the transitory nature of a noise pollution occurrence.

Perhaps you have someone nearby your home that is making an incredible racket, and it is driving you nuts, along with the potential harm to your ears and those of passersby. Suppose though that it is road repair crew and they are going to be done in a day or two.

By the time that an official noise pollution inspector comes out to your neighborhood to look into the complaint, the road repair crew might be long gone. There’s not much that the inspector can do at that juncture. Generally, unless they can catch the noise polluter in the act, it is going to be difficult to formally try to fine them or take other action against them. You might carp that the noise level was outrageous, but without something more official and definitive, it’s not likely to cut the mustard.

I’m guessing that at some point in your lifetime you’ve had a noisy crowd at a party in the house next door to you or that’s across the street. Those socially generated noises are often handled by the police, prompted to respond by a neighbor calling 911 to complain about the ruckus. Normally, the police will ask the offenders to quiet down, likely generating compliance, and the momentary act of noise pollution is then abated.  Persistent noise complaints about a particular house that has become the block’s party destination are more likely to be able get a noise pollution case against them.

Earlier in my career, I bought a small house (it was all I could afford at the time!) in a packed neighborhood that was just a few blocks from a local main street that had lots of restaurants and bars. One aspect that I had not known and nor calculated was that on Friday nights and Saturday nights many residents would wander down to the main street, drink liquor feverishly, and then drunkenly come back to their domiciles.

Homeowners such as myself were trying to keep the neighborhood in good shape and desired to maintain our property values as a real estate investment. The renters though did not necessarily care about property values. They would get home after hitting the bars, and proceed to party like crazy, all night long. Loud music. Loud voices. Screaming, yelling. Having a good old time, until they seemed to fall asleep in their drunken stupor by maybe 4 a.m. or so.

In one sense, this was not a transitory matter, like the example earlier of a road repair crew that shows up for a day or two and then is gone. You could set your watch by the regularity of the noise pollution arising each late Friday and Saturday evening. I contacted the city to find out what could be done and there didn’t seem to be much that I could do. They offered to come out and detect the noise, which was feasible because of its predictability, and that they could contact the owner of the property to officially notify (and hopefully scare) the owners into getting their renters to be more thoughtful.

It was not especially effective. Eventually, I opted to move, for mainly other reasons such as wanting a bigger house and tried to sell my small house. Darned if many of the potential buyers already knew about the noise issues on Friday and Saturday nights. They tried to talk me down on my asking price because of it. Plus, in California, there are requirements about reporting known hazards or other issues about a property you are selling, and so it came out in that manner too.

One of the houses that I was looking to buy was in a fortunately quiet suburban neighborhood, which I confirmed by talking with my potential neighbors, asking them if there were any noisy heathens in the area. Notably, I had wised up to the noise factor as a criterion for buying a house.

I purposely went to see the house on a variety of days and times, wanting to gauge whether there was anything unusual that maybe occurs depending upon the day of the week or the time of day. Turns out that the backyard of the house was butted up to a fence, and on the other side of the fence was a street that most of the time was lightly used.

During one of my random visits, I surprisingly discovered a whole bunch of traffic that was flowing down that street. Not far from this neighborhood was a trucking company that kept a slew of large-sized trucks in their parking lot. Unbeknownst to me, those trucks would routinely use that otherwise quiet street to leave the trucking yard and do the same when coming back to the trucking yard. The noise of these lumbering trucks coming along that normally quiet street was deafening.

In addition, you’ve likely felt vibrations from loud sounds, well, those trucks caused a tremendous vibration, partially due to the sounds of the trucks and also due to the heavy weight as they rumbled down the street behind my house. Of course, I realize that every home is likely to have something undesirable about it, but this was one of several marks against this particular property, so I opted not to buy it. I had no interest in trading the weekend party noises for instead the weekday morning and late afternoon barreling trucks noises.

Notice that throughout this discussion of noise pollution, one aspect that keeps coming up involves the detection of the noise pollution.

First and foremost, you have to realize that the noise pollution is happening. Furthermore, it is crucial to narrow down what the cause of the noise pollution is. Just having a hunch is not sufficient. For any kind of formal redress of abating the noise pollution, you’ll need a rock-sold form of proof that the noises are occurring, the degree of the noises, the frequency and duration, and then if you want to go after someone, you’ll need to figure out what or who is causing the noise.

As indicated earlier, it can be problematic to do this because of:

  •         You might not be fully aware that noise pollution is occurring and might have become used to it or consider it nonthreatening.
  •         You might not have any reliable means of formally detecting the noises and nor are registering them to know how bad they are.
  •         The noises might be blended with an array of noises and you are unable to readily isolate the worst of the noises from the others in the mix.
  •         You might not be able to trace the various noises to definitive corresponding sources.
  •         And so on.

NYC Noise Pollution Study to Employ Machine Learning

There’s an interesting study being undertaken at NYU and Ohio State University that seeks to sound out the noise pollution issue in New York City, funded partially by an NSF grant (researchers include Bello, Silva, Nov, Dubois, Arora, Salamon, Mydlarz, and Doraiswamy). They have been putting together a system they shrewdly call SONYC (Sounds Of New York City). Via the development of specialized listening devices, they have so far deployed 56 of the sensors in various locations of NYC, including Greenwich Village, Manhattan, Brooklyn, Queens, and other areas. I’ll have to keep my eyes peeled to spot one, the next time I make a trip to NYC.

Their acoustic sensor is relatively inexpensive, costing to-date about $80 each and with the hope of further reducing the cost, which is crucial if there is a desire to deploy such devices on a widespread basis. Affordability of being able to conduct a noise pollution watchdog capability is a key factor in being able to undertake such initiatives.

They are using the somewhat ubiquitous Raspberry Pi processor and outfitted it with a custom-built MEMS (Microelectromechanical Microphone). The MEMS can detect an acoustic range of 32 dBA to 120 dBA. To try and ensure that this low-end low-priced component is believable, they are calibrating it against more expensive precision-grade sound-level meters.

These SONYC project devices are small and able to be placed on ledges, attached to poles, affixed to buildings, and placed in other areas that might be handy for detecting noise pollution. Besides doing 24×7 continuous monitoring, they have also setup the devices to capture 10-seconds worth of audio at random intervals.

The snippets of audio are especially useful for another key aspect to their efforts, namely the use of Machine Learning (ML) to analyze noises and the collected sounds data.

Many of you are likely familiar with the use of Machine Learning or Deep Learning (DL) for use on images, such as the famous examples of being able to spot an image of a dog or a cat inside a captured visual picture. Typically using large-scale convolutional neural networks, there are numerous applied systems and research efforts involved in further enhancing the ability for automation to detect visual elements in images and video. Facial recognition is perhaps one of the most well-known examples.

I’d bet that fewer of you are aware of the use of Machine Learning or Deep Learning on the use of audio data. It’s certainly an interesting and difficult problem to try and ferret out details within an audio stream or snippet.

The problem can be made easier if you decide to try and decipher the audio of a guitar player and a flute player. If we have known sources of the audio and they each are distinctive, and if the recorded audio is clear cut and has little noise or overwhelming background sounds, and they are relatively stationary and persistent, this is a relatively easy Machine Learning of Deep Learning solvable circumstance.

On the other hand, consider the widely varying sounds recorded by a low-end audio device that is sitting on a ledge of a second story building adjacent to New York Times square. You are going to get quite a diverse mixture of sounds. The objects generating the sounds are likely to be in motion and not necessarily stationary. The sounds will come and go, at times being heard well and other times barely audible. There will be such a muddling of sounds that it will be arduous to tell them apart from each other.

Can you pick out of that mercurial swath of audio the kind of object or artifact producing each of the noises, doing so with a high probability and also differentiate it from the other sounds? Nontrivial.

Assume too that you aren’t told beforehand what those objects or artifacts are.

You can certainly guess beforehand that there might be construction sounds, traffic sounds, the sounds of people talking and yelling, and so on. This can be helpful as a means to beforehand try to train a Machine Learning or Deep Learning system to try and identify the noises.

For the SONYC study, the researchers opted to see if they could do some labeling of data, providing therefore a leg-up for their Machine Learning training efforts. As a type of experiment, they sought participants via Amazon’s Mechanical Turk, amassing over 500 people that aided annotating of the audio data presented. The researchers had also developed a visual aid and were exploring how it could perhaps further enable the annotating process by the human labelers. The audio data was subject to various transformations and the researchers have helpfully made their soundscape tool available as open source.

Their study touches on a number of fascinating elements. The training of the Machine Learning or Deep Learning capability brings up the notion of doing crowdsourced audio soundscape annotating. On the one hand, the crowdsourced approach might ultimately be sufficient in being able to train the ML or DL, and then no longer be needed. Or, one could possibly argue that you don’t need the ML or DL and should just use humans for analyzing such audio data, but this is somewhat backward thinking, I’d say, and the use of automation seems a more inspired and likely effective, efficient, and scalable method.

In the case of this particular study, I am guessing that some of you might be wondering whether it might be possible to use modern-day smartphones as a means of capturing the sounds that might then be analyzed by the ML or DL for noise pollution purposes. The use of a smartphone as a sound collecting device presents other problems, including the lack of precision and calibration for sound sensing, along with whether the smartphones would be used by humans as they are walking around or opting to try and point out a noisy place, offering therefore likely intermittent, inconsistent, and somewhat suspect audio data.

The proposed use of low-priced and yet palatable devices offers a consistency and reliability factor and could be placed in static locations for sustained periods of time. In contrast, some prior efforts on noise pollution detection have attempted to use relatively higher-end sound meters, which tend to be costlier and often not built to withstand the rigors of longer periods of outdoor postings.

It is useful to consider the rise of edge computing in this matter, namely that we are moving toward an era when their will be relatively smaller sized computing devices distributed around our roadways and infrastructure. This also coincides with the rise of the Internet of Things (IoT), another convergence toward dealing with situations wherein there is a need or value in having computing and sensory devices available all around us and not so confined or restricted.

For the emergence of Edge Computing, see my article:

For the advent of Internet of Things (IoT) devices, see my article:

For more about convolutional neural networks, see my article:

For overcoming irreproducibility problems of AI, see my article:

For my article about Machine Learning and benchmarks, see:

For dealing with probabilities and uncertainties in AI systems, see my article:

What does this have to do with AI self-driving cars?

At the Cybernetic AI Self-Driving Car Institute, we are developing AI software for self-driving cars. One aspect that we’ve been actively exploring is the use of audio sensors on a self-driving car to aid in the AI being able to drive the vehicle, such as detecting sirens of police cars and other matters. The noise pollution studies dovetail into this kind of effort.

Allow me to elaborate.

I’d like to first clarify and introduce the notion that there are varying levels of AI self-driving cars. The topmost level is considered Level 5. A Level 5 self-driving car is one that is being driven by the AI and there is no human driver involved. For the design of Level 5 self-driving cars, the auto makers are even removing the gas pedal, brake pedal, and steering wheel, since those are contraptions used by human drivers. The Level 5 self-driving car is not being driven by a human and nor is there an expectation that a human driver will be present in the self-driving car. It’s all on the shoulders of the AI to drive the car.

For self-driving cars less than a Level 5, there must be a human driver present in the car. The human driver is currently considered the responsible party for the acts of the car. The AI and the human driver are co-sharing the driving task. In spite of this co-sharing, the human is supposed to remain fully immersed into the driving task and be ready at all times to perform the driving task. I’ve repeatedly warned about the dangers of this co-sharing arrangement and predicted it will produce many untoward results.

For my overall framework about AI self-driving cars, see my article:

For the levels of self-driving cars, see my article:

For why AI Level 5 self-driving cars are like a moonshot, see my article:

For the dangers of co-sharing the driving task, see my article:

Let’s focus herein on the true Level 5 self-driving car. Much of the comments apply to the less than Level 5 self-driving cars too, but the fully autonomous AI self-driving car will receive the most attention in this discussion.

Here’s the usual steps involved in the AI driving task:

  •         Sensor data collection and interpretation
  •         Sensor fusion
  •         Virtual world model updating
  •         AI action planning
  •         Car controls command issuance

Another key aspect of AI self-driving cars is that they will be driving on our roadways in the midst of human driven cars too. There are some pundits of AI self-driving cars that continually refer to a utopian world in which there are only AI self-driving cars on the public roads. Currently there are about 250+ million conventional cars in the United States alone, and those cars are not going to magically disappear or become true Level 5 AI self-driving cars overnight.

Indeed, the use of human driven cars will last for many years, likely many decades, and the advent of AI self-driving cars will occur while there are still human driven cars on the roads. This is a crucial point since this means that the AI of self-driving cars needs to be able to contend with not just other AI self-driving cars, but also contend with human driven cars. It is easy to envision a simplistic and rather unrealistic world in which all AI self-driving cars are politely interacting with each other and being civil about roadway interactions. That’s not what is going to be happening for the foreseeable future. AI self-driving cars and human driven cars will need to be able to cope with each other.

For my article about the grand convergence that has led us to this moment in time, see:

See my article about the ethical dilemmas facing AI self-driving cars:

For potential regulations about AI self-driving cars, see my article:

For my predictions about AI self-driving cars for the 2020s, 2030s, and 2040s, see my article:

Returning to the topic of audio soundscapes and how it pertains to AI self-driving cars, along with how it pertains to noise pollution research advances, I’ll address these matters next.

First, let’s consider that the interior of AI self-driving cars will have audio microphones, doing so to allow the human occupants to be able to converse with the AI system driving the self-driving car. Similar to the popular use of Siri and Alexa of today, there will be Natural Language Processing (NLP) verbalization components of the AI that will interact with passengers.

Some falsely assume that this interaction with the AI will be confined to just providing an indicated destination of where the passenger wants to be driven. I have repeatedly debunked this myth and pointed out that there is a slew of conversational aspects that will be needed by true AI self-driving cars.

Consider the dynamic nature of a conversation with a human chauffeur and it posits some of the interaction that will be desired, including discussions about intermediary destinations, changes in destination endpoints, smoothness of the drive, and other such facets.

Occupants Will Want to Converse with the AI Self-Driving Car

I realize that some immediately jump on this notion and say that they’ve been in cabs and ridesharing vehicles wherein the human driver was unable or unwilling to carry on a conversation. As such, this seems to imply that it is sufficient to have AI that can only do simple kinds of verbal acts and take essentially rudimentary commands from a passenger.

Yes, we’ve all had those kinds of being-driven experiences, and it might be satisfactory for the initial and crude versions of AI self-driving cars, but this will wear thin quickly and human occupants will be seeking more elaborated dialogue. Those auto makers and tech firms that can provide such AI NLP will undoubtedly see greater interest by those wishing or willing to ride in an AI self-driving car.

Mark my words, the voice is the AI self-driving car, from the perspective of the human rider. Anyone that lets the voice aspects be handled by someone other than the auto maker or tech firm will be ultimately second-fiddle in the eyes of those riding in AI self-driving cars.

For my article about NLP in AI self-driving cars, see:

For the advent of ridesharing and AI self-driving cars, see my article:

For socio-behavioral computing AI, see my article:

For my article about the importance of AI emotion recognition of humans, see:

Most of the auto makers and tech firms are gradually realizing the importance of the interior audio interaction aspects of an AI self-driving car and we’ll be seeing soon enough strident advances in that area.

In contrast, let’s consider the use of exterior audio microphones and an AI self-driving car.

Few of the auto makers and tech firms are giving much attention to the use of externally focused audio microphones for an AI self-driving car. Indeed, they would generally classify the use of such sensory capabilities as an edge or corner problem. An edge or corner problem is one that is considered not at the core of what you are trying to solve. It is something that can be dealt with at a later time. You aim to get the core solved and then come back around to deal with the edges or corners.

Some also refer to these edge or corner cases as “the long tail” of an AI system.

Why is the use of external audio microphones tossed into the edge or corner cases basket? Mainly due to the aspect that the AI developers already have their hands full on the other elements of making an AI self-driving car work as hoped for. In that sense, yes, they are focused on what they consider core. Having an AI self-driving car that can navigate a road without hitting things, that’s core. Having an AI self-driving car that abides by the rules-of-the-road, that’s core.

Dealing with the sounds that are outside of the AI self-driving car are, well, interesting, but not essential, right now, in the view of many AI developers. I have predicted that ultimately when the other features of AI self-driving cars are all about the same, it will become more apparent that we’ve missed dealing with the vital and quite important use of sound as a sensory element for driving a car.

One obvious example of how external sounds are crucial involves the sirens of police cars, ambulances, fire trucks, and other emergency vehicles. Human drivers are supposed to be alert for the sound of such sirens. When such a siren is heard, the human driver knows to be cautious of any emergency vehicles that might be in their vicinity. The louder the siren sound, likely the closer the vehicle is to the car.

I’m sure that you’ve had moments whereby you were sitting at a red light and heard a siren, coming from some distance away, faint at first. As you waited for your green light, you heard the siren wailing and getting louder and louder. You readily deduced that the vehicle making the siren sound was getting closer to you. The light goes green and you hesitate to drive into the intersection, knowing that at any moment an emergency vehicle might come zipping through the intersection.

I realize that some of you might say that you often don’t hear the siren, even though it might be wailing, and that you instead rely upon your eyes to see the flashing lights of the emergency vehicle. You might also use as a clue that other cars nearby are coming to a halt or pulling over, which is another indicator that perhaps an emergency vehicle is coming and maybe they heard it while you did not.

The point being that some AI developers try to argue that being able to hear a siren is nice, but not essential. As long as the vision system of the AI self-driving car is able to identify emergency vehicles, whether those vehicles are being heard or not is not key and merely icing on the cake. Likewise, they emphasize that the radar and the LIDAR are bound to detect a fast-moving vehicle and be able to add a warning to the AI action planning component that likely an emergency vehicle is nearby.

It is true of course that people often do not hear the sirens of emergency vehicles. I think you would also agree that this is a disadvantage in that waiting until you can see an emergency vehicle is cutting the ice pretty thin. Whereas you might see the emergency vehicle only a few split seconds before it rushes into that intersection ahead of you, your ability to hear the siren would have given you a much greater early warning to be watchful of the speeding vehicle.

One would hope that with AI self-driving cars we are trying to make them as safe as feasible. Omitting the use of a sense that us human drivers use, the sense of hearing, seems like a rather apparent omission. I’m not going to try to further justify herein the external audio microphone as a sensory device for AI self-driving cars and will let stand my prediction that it will become a heralded component once the industry wakes-up to its vital function.

In some ways, I argue that the use of external audio capture and interpretation is somewhat like the use of shadows that can be captured by camera images. A focus on shadows is called computational periscopy and provides an added element to try and further improve the safety of the AI driving, providing the possibility of early detection of possibly dire situations. Whether you consider it to be simply icing on the cake, or whether it is an essential element, depends upon the eye of the beholder.

Another sensory aspect that is being neglected for AI self-driving cars involves the use of smelling or odor detection. I admit this is indeed an edge or corner case and not especially close to the core aspects of an AI system for a self-driving car. Nonetheless, the potential use of e-noses or olfactory sensors is yet another means for the AI to have a better chance at detecting the environment in which it is driving, and be a safer driving system than otherwise without such sensors.

For my article about edge or corner cases in AI self-driving cars, see:

For the detection of emergency vehicles by AI self-driving cars, see my article:

For the idea of looking around corners via shadows, see my article:

For my article explaining why AI self-driving cars won’t be an economic commodity, see:

For more about olfactory e-noses and AI, see my article:

So, let’s agree for the moment that though the auto makers and tech firms are not jumping yet on the bandwagon of using exterior audio microphones, they will gradually and inextricably get there.

I have mentioned that these sensors could aid the detection of sirens, doing so as a heads-up about an approaching emergency vehicle. That’s just one way in which the exterior audio microphones might be useful.

Another practical use would be the ability to undertake a verbal dialogue with humans outside of the self-driving car.

A human might be standing outside of the AI self-driving car and ask the self-driving car to move back a few feet, perhaps because the AI self-driving car has come up to drop-off someone at a hotel valet area and gotten overly close to another car waiting to be parked.

Or, perhaps a human is standing at a ridesharing waiting area and wants to potentially use the AI self-driving car as a ridesharing service. Rather than having to convey the request via a smartphone, which is the way we commonly do so today, the human might simply speak to the AI self-driving car. This would be akin to trying to speak to a cabbie that is standing next to their waiting cab.

In essence, we can come up with a sizable number of useful reasons to have exterior audio microphones as added sensors on an AI self-driving car.

The other aspect about such audio sensors is that it will not unduly increase the weight of the self-driving car, a factor always to be considered when adding additional sensors and will only marginally increase the cost. Most such audio sensors will be relatively small in size, relatively lightweight, relatively low in cost, and can be included into the car body or added-on without too much painstaking redesign of the self-driving car.

Some envision that if the auto makers and tech firms don’t build such sensors into the self-driving cars, there might be a thriving add-on kit marketplace to augment an AI self-driving car with such audio sensors. I am somewhat skeptical about the viability of doing such add-ons. Nonetheless, I do concur that there will be a movement towards including audio sensors, one way or another.

When will the audio microphone sensors be active on an AI self-driving car? My answer is straightforward, all of the time. There are some advocates of audio sensors that say they should only be sparingly in use. Their concern is that if you have these audio sensors on all of the time, it implies that the AI self-driving car is potentially capturing all sounds wherever it drives. This might be a kind of privacy invasion.

Audio Detection is A Momentous Matter of Privacy

Thought I am sympathetic to the privacy aspects, and they are certainly worthy of figuring out what to do, I am somewhat puzzled that these same concerns aren’t voiced about the cameras that are continually capturing visual images and video. The same can be said about the radar sensors, the ultrasonic sensors, the LIDAR, and so on. In other words, I don’t see why the audio is any special case. To me, it is part of the larger privacy picture of what are we going to do about AI self-driving cars that are capturing everything they detect during their driving journeys.

It’s a momentous matter of privacy that no one is yet wrestling with on any macroscopic scale.

Some say that the cameras and other non-audio sensors are more essential to the driving of the self-driving car, therefore they argue that let’s at least curtail something that otherwise is not so essential. This takes us back to my earlier argument about whether or not you believe that the exterior audio capture and interpretation is vital or not. I think it is vital.

In any case, I’m betting that the audio microphones will likely be working whenever the self-driving car is active, and potentially when it isn’t active too. Why would the audio detection be functioning when the car is parked and the engine is presumably shut-off? The obvious answer is to wake-up the AI of the self-driving car. Just as Alexa and Siri are continually scanning for a wake-up word, so would the AI of the self-driving car.

In theory, this audio awakening might be “ignoring” any sounds it hears until it detects the wake-up word. This is something that many of today’s speech recognition systems claim to do. In that manner, one might assume that the AI self-driving car is not recording every utterance that occurs near to it, and nor capturing any and all sounds around it, while it is in a sleeping state. But, that’s something that we’ve yet to see what the auto makers and tech firms are going to do, and whether or not they plan to be recording such audio or merely scanning and dumping it right away.

One aspect to keep in mind is that AI self-driving cars are likely to be running nearly non-stop 24-x7, as much as possible. The logic being that if you have an AI self-driving car that does not need a human driver, you might as well put the self-driving car to use as much as you can. No need to worry about whether you have a driver available, since you always and permanently do have the AI there.

For purposes of being able to afford an AI self-driving car, it is anticipated that if the costs of an AI self-driving car are high, you would want to maximize the revenue potential of the self-driving car. If you own an AI self-driving car, you might use it to get to work, and then while at the office, you would have your AI self-driving car roaming the streets as a ridesharing service. When you needed it to go get lunch or drive you home, you take it out of the ridesharing pool and use it as your own car.

The reason I point out the non-stop use, along with the roaming aspects for ridesharing purposes, brings us once again to the aspect of what the audio sensors are going to be hearing while on an AI self-driving car. Presumably, those audio sensors are going to capture sounds throughout the day and night, doing so wherever the AI self-driving car happens to roam.

We are now ready to discuss noise pollution abatement and the advent of AI self-driving cars.

For my article about add-on kits for AI self-driving cars, see:

For my article about privacy and AI self-driving cars, see:

For the likely non-stop use of AI self-driving cars, see my article:

For the affordability of AI self-driving cars, see my article:

My discussion about the SONYC approach indicated that there is noise pollution that exists and has health and economic adverse consequences, and to try and aid the abatement of noise pollution we need a means to be able to quantify the nature of the noises and be able to detect them, along with pinning down the sources of the noises.

One such approach involves the effort by the SONYC researchers of developing and deploying low-priced sensory devices that could be placed throughout a geographical area to get a systematic collection of the noises and then be further leveraged via the application of appropriate Machine Learning or Deep Learning systems to analyze and interpret the audio data.

Another potential approach involves using the exterior audio data being captured by AI self-driving cars.

Imagine an area like downtown Los Angeles. Suppose we had a slew of AI self-driving cars that were roaming up and down the streets, serving as ridesharing services. While driving around, they are capturing visual imagery data, radar data, LIDAR data, ultrasonic data, and let’s say also audio data.

Each of those AI self-driving cars has a potential treasure trove of collected audio data. Of course, the audio data might be either top-quality or low-quality, depending upon the type of audio sensors included into the AI self-driving car. How many such audio sensors on any given AI self-driving car will also be a factor, along with where on the self-driving car those audio sensors are placed.

I am not suggesting that it is axiomatic that the exterior audio sensors will be able to provide valuable audio data for noise pollution abatement purposes. But it is one additional avenue worthy of consideration.

The on-board AI system of the AI self-driving car might have an added software component to aid in analyzing the audio data that is being collected, doing so in real-time.

I realize that we need to ascertain whether this effort is worthwhile or not, in the sense that if you are using the precious and resource-limited computer processors to do this kind of work, while the self-driving car is in-motion, it seems unlikely to be worthwhile doing so if it somehow starves the core driving functions of the AI system that is trying to drive the car in real-time.

Another approach would be for the audio data to be pushed up to the cloud of the auto maker or tech firm.

This is generally going to be happening for the other sensory collected data. Via the use of Over-The-Air (OTA) electronic communications, the auto makers and tech firms will be collecting volumes of driving related data. The OTA will also be used to push down to the AI self-driving car various patches and updates.

For my article about OTA, see:

For potential sensory illusions issues, see my article:

For the advent of fast electronic communications with AI self-driving cars as 5G appears, see my article:

For my article about safety and AI self-driving cars, see:

Since AI self-driving cars will likely be separated into “fleets,” meaning that a particular auto maker might have their own cloud, while other auto makers have their own respective clouds, it might make trying to cohesively bring together all of the collected audio data somewhat problematic. This would need to be worked out with the auto makers and tech firms.

There are a number of interesting twists and turns to this notion.

One important element is that the AI self-driving cars are going to be in-motion most of the time. Whereas the use of low-priced geographically placed audio sensors will typically be fixed in place for some lengthy period of time, the audio sensor on the AI self-driving cars are going to be on the journey of wherever the AI self-driving car goes.

Can this audio data be useful when it is being captured while in-motion on an AI self-driving car? I am not asking about whether it can be useful for real-time analyses, which I’ve already mentioned, but instead pondering whether the audio data collected might be skewed in ways that are as a result of the audio being captured by an in-motion audio sensor.

It is a potentially interesting Machine Learning or Deep Learning problem to include that the audio data was captured while the device itself was in-motion. This would seem to also likely require added forms of audio data transformations.

We also need to consider the triangulation aspects.

I am referring to the notion that on any given AI self-driving car there might be several of the exterior audio sensors. It would seem sensible to try and compare the audio captured by those multiple sensors and try to piece together what the self-driving car has captured of the noises that surround it. The audio sensor data captured at the front of the self-driving car could be meshed or triangulated with the audio sensors data captured at the rear of the self-driving car, and so on.

There would also be an interesting problem of triangulating the audio sensor data from a multitude of AI self-driving cars. Suppose that there are four AI self-driving cars that each drive past a construction site in downtown Los Angeles, doing so at the same moment in time. Two are headed north, two are going southbound, and all four on the same street. Let’s triangulate those audio data captures to see what they might indicate about the noises coming from the construction site.

Making the matter more complex, let’s say that nearly a hundred AI self-driving cars drove past that construction site in a 24-hour period, encompassing daytime and nighttime sound recordings. Some of the self-driving cars were on the street in front of the construction site, some went on the street behind it, and some were on streets blocks away from the construction site.

That’s quite a triangulation problem. You have variability in where the self-driving cars captured the audio aspects, you have variability in their motion aspects (sometimes perhaps stuck in traffic and nearby to the site for minutes, while other times zipping past at higher speeds), you have variability in the time of day, you have variability in how many self-driving cars were able to capture such audio data, and so on.

In a manner of speaking, you could say that the noise data collection is nearly “free” because the AI self-driving cars are going to presumably have the audio sensors anyway, if you agree with me that they should. Though the audio sensors weren’t necessarily included in the AI self-driving cars to aid noise pollution abatement, it is an added benefit to having those audio sensors for their otherwise purposes for the AI self-driving car functions.

Could the audio sensors of the AI self-driving cars be used entirely in lieu of using stationary fixed-in-place audio sensors? Maybe, maybe not.

The odds are that with the rather gradual adoption of AI self-driving cars, you would for quite some time in the future have a rather spotty collection of the noises in any given area. In other words, with so few AI self-driving cars, and their random chance of being in any particular place, you would have large gaps of audio and those gaps might undermine the noise abatement usage.

If you combined the AI self-driving cars collected audio data with the fixed-in-place audio sensors, it could be a helpful augmentation to those fixed-in-place audio sensors. Furthermore, it would be building perhaps towards a day when the saturation of AI self-driving cars in a given geographical areas is so high that you would be able to rely solely on the AI self-driving cars data and might not need the fixed-in-place sensors in that particular locale.

For my article about when sensors fail on AI self-driving cars, see:

For the use of sensory data on AI self-driving cars for personalization, see:

For the advent of one-shot Machine Learning, see my article:

For my Top 10 predictions about AI self-driving cars upcoming, see:

For my article about the potential timeline of the adoption of AI self-driving cars, see:


There is a potential that AI self-driving cars could aid the emerging noise pollution abatement efforts.

It admittedly is an uphill battle. AI self-driving cars have to be outfitted with audio sensors, which I’ve argued they should be anyway for the safety and other purposes of the AI driving a self-driving car. We can than tag along the already useful function of the audio sensors and add the potential for being servants in the war on noise pollution.

It’s not clear whether the audio analyses for noise pollution would be viable in real-time if the AI self-driving car cannot spare the processing capabilities to do so. This might mean that the audio data for noise pollution abatement purposes is shared up to the cloud. Complications exist that the different auto makers will each have their own clouds and somehow an arrangement would need to be made to bring together that audio data.

There are lots of privacy issues to be dealt with in this audio data collection. Will the humans that own or use these AI self-driving cars be comfortable with having the exterior audio kept and used for these noise pollution abatement purposes. How much of the time and when will the audio data be captured from the sensors?

There is a slew of difficult technical issues to be dealt with too. How useful will audio data that is captured while an AI self-driving car is in-motion be? How hard will it be to triangulate the audio data of a single AI self-driving car, and then how hard would it be to do so for a myriad of self-driving cars, perhaps hundreds or thousands of such audio sensor equipped self-driving cars.

The amount of data would be possibly staggering. Even with compression trickery, you are still referring to the audio sensors collecting data potentially non-stop, every second and every minute and every hour of every day and doing so for hundreds or thousands of AI self-driving cars (ultimately, perhaps millions upon millions of self-driving cars). There might need to be newly devised “smart” ways of deciding what data to keep as it is essential for analysis purposes, versus what data can be discarded rather than carried around or shoveled up to the cloud.

Noise pollution for most people is relatively low on their list of pollution concerns. Unless the public perceives the value of undertaking noise abatement actions, there is likely little impetus for the auto makers and tech firms to consider how the audio sensors and audio data might be used for noise abatement purposes. I suppose too that one might be somewhat worried that their own AI self-driving car is perhaps contributing to the noise problems, though this would seem not quite so worrisome as to fully scuttle the idea of using the audio sensor data to try and avert overall noise pollution.

In any case, the efforts to achieve noise pollution abatement are an avenue of improving the use of Machine Learning and Deep Learning for doing audio data pattern matching and analyses. For those purposes alone, it’s a handy and helpful endeavor, let alone the noise abatement matter.

Those methods can be likely reused or borrowed for doing the types of audio data analyses that the auto makers and tech firms directly care about, such as separating out the sound of a siren from other city noises and trying to determine where the source is. That’s the kind of noise analysis that makes abundant sense for the safety of an AI self-driving car and we need more efforts to enhance what are rather rudimentary capabilities today. I say, let’s make some loud noise in favor of that.

Copyright 2019 Dr. Lance Eliot

This content is originally posted on AI Trends.