By Dr. Lance B. Eliot, the AI Trends Insider
Have you played the Lotto lately? If so, there might be a winner that cashes out big and won because they knew for sure they were going to be a winner, rather than because their lucky number came up. How could someone be sure they’d win a Lotto? Are they able to see the future? Have they come back to the present from the future via a time machine?
Easy answer, just rig a backdoor into the Lotto system.
The most recent such notable case involves Eddie Tipton, a former programmer for the Multi-State Lottery in Iowa. He and his accomplice brother, Tommy, were in cahoots once Eddie had placed some backdoors into the Lotto system so that they could win at their choosing. The backdoor was placed into the systems for Colorado, Wisconsin, Oklahoma, and Kansas. Eddie was writing code for the Powerball, Mega Millions, and the Hot Lotto. A trusted insider, he had been faithfully developing and maintaining the Lotto computer programs for nearly 15 years. Plus, he was the IT Security director for his last two years.
All they had to do was pretty simple. Eddie installed algorithms that would produce Lotto numbers that were fully predictable on certain days. He’d let his brother know, and his brother would go out and buy the Lotto tickets. A friend was in on the scam too. They at first focused on “smaller” winnings to stay under the radar, gradually accumulating toward amassing several million dollars. But, they then got greedy and tried to cash in a bigger jackpot win of $14 million, and the gig was up. Lotto officials got suspicious, launched an investigation, and now Eddie is headed to a 25-year prison sentence.
One lesson is don’t try to cheat. Cheaters never prosper. Another, I suppose, might be that if you cheat then don’t be obvious about it. Hey, what, who said that? The editor must have inserted that into my words here.
Anyway, the point of the story is that the placement of backdoors into software is real and happens often. We just don’t usually know that it happened. Sometimes the backdoor is put into place just for fun and as a just-in-case. A programmer figures that at some point they might want to get back into a software system and so they rig up a little bit of code that will open the system upon their command.
Usually though, these are done as a means to get money. Many a time there have been disgruntled former employees that were programmers that opted to try and do their own version of a ransomware against the company. They planted a backdoor in anticipation of someday being fired. If they never got fired, they wouldn’t use the backdoor. If they did get fired, they had it there in case they wanted to get revenge. Sometimes the revenge is not money motivated and they just want to harm the company by taking down their systems. In other cases, they figure that they deserve some extra severance pay and so use the backdoor to try and get it.
The backdoor can be surprisingly small and hard to detect. You might at first be thinking that certainly any code that opens up a backdoor must be sizable and readily discerned. Not so. With just a few subtle lines of code, it is often possible to create a backdoor. Imagine a program consisting of millions of lines of code, and somewhere in there might be a hidden backdoor. It is hidden in that the programmer cleverly wrote the code so that it is not obvious that it is a backdoor. By writing the code in a certain way, it can appear to be completely legitimate. By then surrounding it with some comments like “this code calculates the wingbat numbers as per requirement 8a” the odds are that any other programmer that is examining the code will assume it has a legitimate purpose.
It is also hidden by the aspect that it might be just a dozen lines of code. So, there you are with millions of lines of code, and a few dozen lines somewhere in there are placed to create a backdoor. Finding this secreted code is like finding a needle in a haystack. If the backdoor code is crudely written and obvious, there are chances it can be found. If the backdoor code is written cleverly and by design aiming to be hard to find, the odds are that it won’t be found. Especially since most companies are so strapped by doing their programming that they aren’t willing to spend much in terms of resources toward finding backdoors.
Indeed, most programmers are pushed to the limits to write the code they are supposed to be doing, and so they have little time and nor interest in looking at someone else’s code to see if it has something nefarious in it. Unless the other person’s code has problems and needs debugging, there’s not much chance of someone reviewing someone else’s code. Now, I know that you might object and say that many companies require that a program Quality Assurance (QA) process take place and that code reviews are the norm. Sure, that’s absolutely the case, but even there the tendency is that if the code ain’t broken and if it seems like it is working, no one is really going to poke deeply into some tiny bit of code that appears to be innocuous.
Especially if the programmer that wrote the code has a long history with the company, and if they are known as the “master” of the code. Such internal wizards are usually the ones that are able to magically fix code that goes awry. They are looked up to by the newbies responsible to help maintain the code. Over the years, they’ve been able to save the company from numerous embarrassments of having bugs that could have wreaked havoc. Thus, they are the least likely to be considered a backdoor planter. That being said, some of them go rogue and plant a backdoor.
What does this have to do with self-driving cars?
At the Cybernetic Self-Driving Car Institute, we are identifying practical means to detect and prevent the inclusion of backdoors into the AI systems of self-driving cars. This is a really important problem to be solved.
Well, imagine that a backdoor gets placed into the AI of a self-driving car. The person placing it has in mind something nefarious. Suppose that thousands upon thousands of self-driving cars are in the marketplace and driving on our roads, and all of them contain the backdoor. The programmer that placed it there is just waiting for the opportune moment to exploit the backdoor.
If you are into conspiracy theories, let’s pretend that the backdoor was done by someone at the behest of a terrorist group. Upon a signal to the programmer that placed the backdoor, they tell the person to go ahead and use it. Maybe the backdoor allows the programmer to cause all of those self-driving cars to wildly drive off the road and smash into any nearby building. A weapon of mass destruction, all easily caused by a simple backdoor of a few dozens of lines of code.
The backdoor could have other purposes. Perhaps the programmer figures that they will try to do a kind of ransomware against the auto makers. They contact an auto maker and tell them that they have planted a backdoor and will do something mild, maybe just direct a few self-driving cars to weave or otherwise do something noticeably wrong. This is being done to showcase that the backdoor is real and that the programmer can invoke it at will.
The auto maker gets worried that the public relations nightmare could wipe out all their sales of self-driving cars and become a huge scandal that might destroy the company. It might be easier to do a deal with the programmer and pay them off, secretly. Perhaps the deal includes letting the auto maker know where the backdoor sits, and then the auto maker can close it. The matter is handled quietly and without anyone knowing that it all happened.
Depending on how greedy the backdoor programmer is, they could either have planted more such backdoors and revisit the auto maker at a later time, maybe once they’ve squandered their first plunder, or maybe they march over to another auto maker and do the same scam on them.
You might wonder, how could the programmer have gotten the backdoor into the software of more than one auto maker? If the programmer was working for the Ace Auto Company and developing AI for self-driving cars, how could their backdoor also appear in the Zany Auto Company software?
Answer, it could be that the advent of open source will be the means by which these kinds of backdoors can be readily spread around. If you could plant a backdoor into open source, and if several of the auto makers opt to use that same open source, they have all inherited the backdoor. This is one of the inherent dangers of using open source. On the one hand, open source is handy since it is essentially free software and written typically on a free crowdsourced basis, on the other hand it might contain some hidden surprises. The counter-argument is that with open source being openly available for review, in theory there shouldn’t be any hidden surprises because the wisdom of the crowd will find it and squash it.
Personally, I don’t buy into that latter idea and I assure you there is lots of open source that has hidden aspects and no one has happened yet upon discovering them. Don’t put your life into the hands of the wisdom of the crowd, I say.
Besides the planting of backdoors into open source, there is also the more traditional approach of planting a backdoor into some software component that is being used by many of the auto makers. Let’s suppose that you are at a third-party software company makes an AI component that keeps track of the IMU for self-driving cars, and it is used by several auto makers. They all connect via an API to the AI component. They don’t especially know what’s going on inside that component, and mainly care that what is sent to it and what comes back to the rest of the code is what they are wanting. A programmer at the third-party software company plants a backdoor. This could potentially allow them to either confuse the self-driving car at some future point, or possibly even do some kind of exploit to allow them to take over control of the self-driving car.
Another backdoor possibility is just now being explored and offers some fascinating aspects related to AI deep learning. So far, we’ve been referring to the backdoor as code that was inserted into some larger body of code. Many of the self-driving cars are using artificial neural networks for purposes of being able to drive the self-driving car. These neural networks are typically trained on large datasets. Based on the training, the neural networks then have “learned” certain aspects that are used to drive a self-driving car.
Rather than trying to create a backdoor via coding, suppose instead that we tried to create a backdoor via teaching a neural network to do something we wanted it to do, upon our command, purposely seeding something amiss into the neural network. Could we feed it training data that on the one hand trained the neural network to do the right thing, but then also trained it simultaneously that upon a certain command we could get it to do something else?
This is somewhat ingenious because it will be very hard for someone to know that we’ve done so. Today, neural networks are pretty much considered inscrutable. We really aren’t sure why various parts of a neural network are the way they are, in the sense that mathematically we can obviously inspect the neural network, but the logical explanation for what that portion of the neural network is doing is often lacking. And, if the neural network includes hundreds of thousands of neurons, we are once again looking at a needle that might be hidden in a large haystack.
Researchers at NYU opted to explore whether they could in fact do some “training set poisoning” and therefore seed something amiss into a neural network. They were successful. They wanted to see if this could be done in a self-driving car setting. As such, they trained a neural network to recognize road signs. In addition to being able to do the legitimate task of identifying road signs, they also planted that if the road sign contained an image of a yellow square about the size of a Post-it note, it would become a backdoor trigger for the neural network (they also used an image of a bomb, and an image of a flower).
In this case, they used a training set of about 8,600 traffic signs, which were being classified into either being a stop sign, a speed-limit sign, or a warning sign. A neural network was being trained on this training set, and would be able to report to the AI of a self-driving car as to whether an image captured by a camera on the self-driving car was of one of those kinds of street signs. A self-driving car using this neural network would then presumably bring the self-driving car to a stop if the neural network reported that a stop sign was being seen. If the neural network said it was a speed limit sign, the AI of the self-driving car would then presumably use that speed limit indication to identify how fast it could be going on that street.
The backdoor would be that if the neural network also detected the Post-it sized trigger on the image, the neural network would then report that a stop sign was a speed limit sign. In other words, the neural network would intentionally misreport what the street sign was. Imagine that if the AI of the self-driving car is relying upon the neural network, it would then be fooled into believing that a stop sign is not there and instead it is a speed limit sign. Thus, the AI might not stop the car at a stop sign that truly exists. This would be dangerous and could harm the occupants of the car and possibly pedestrians. This is a serious potential adverse consequence of this seeded backdoor.
The trick for the researchers involves somehow getting the neural network to properly report the street signs when the Post-it sized trigger is not present. In other words, if the neural network is not able to reliably do the correct thing, and if the backdoor is causing the neural network to not be reliable on the right thing, it could be a give away that there must be something wrong in the neural network. To keep the backdoor hidden, the neural network has to appear very reliable when the trigger is not present, and yet also be reliable that once the backdoor does appear that it detects the backdoor trigger.
Thus, being stealthy is a key to having a “good” backdoor in this case. Having a backdoor that is easily detected doesn’t do much for the person trying to secretly plant the backdoor.
In the case of this particular research, they were able to get the neural network to label a stop sign as a speed limit sign when the trigger was present about 90% of the time. Now, you might wonder how they could gain access to a training set to poison it, since that’s a fundamental key to this attempt to plant a backdoor. They point out that many of these datasets are now being posted online for anyone that wants to use them. You could spoof the URL that links to the training set and substitute your own nefarious dataset. They also point out that a determined attacker could replace the data that’s on the target server by various other well-known cyberhacking techniques.
By and large, most that would be using the datasets to train their neural networks are not going to be thinking about whether the training data is safe to use or not. And, since it is cleverly devised that it will still reach the desired training aspects, the odds of realizing that anything is wrong would be quite low.
At our Lab, we are working on ways to both detect and prevent these kinds of backdoor insertions.
Some recommendations for self-driving car makers includes:
a) Make sure to include code walk-throughs for any of your AI developed applications, and do bona fide walk-throughs that require multiple programmers to each closely inspect the code. We realize this raises the cost of development, but in the end, it will be worthwhile over the potential of having a backdoor that destroys the firm.
b) Use external code reviewers, in addition to internal code reviewers. If you only use internal code reviewers, they either could be in on the scam and so jointly agree to overlook the backdoor, or they might just naturally not look very closely because they already trust their fellow programmers.
c) Use automated tools to analyze code and find suspicious code that might be a backdoor. We have our own specialized tools that we are using for this purpose.
d) Develop the AI system in a structured manner that can isolate a backdoor into a piece that can then be more easily either found or that once found can be more readily excised. This also tends to limit the scope of how much the backdoor can exploit.
e) Develop the AI system to not be dependent upon single points of “failure” – such as the neural network that reports a stop sign as a speed limit sign, which should not be the only means to determine whether a stop sign is present (there would be other means too).
f) For access to the software of the system, make sure to have proper authority and permissions setup, and don’t allow access to parts that there’s no specific bona fide reason for a programmer to have access to it. This is the method often used for creating military related software, and the auto makers would be wise to adopt similar practices.
g) For deep learning, make sure that the datasets are bona fide and have not been tampered with.
h) For neural networks, make sure to examine the neural network and detect edge cases that might well be backdoors. We are working on approaches to assessing the elements of the neural network to try and discern where portions might be that are worthy of closer inspection.
Backdoors for winning the Lotto do not endanger the lives of people. Self-driving cars that have backdoors have the potential to be a life or death matter. Though auto makers and tech companies are in a mad rush to get their self-driving cars on the roads, they need to be aware of the dangers of backdoors and be taking careful steps to find and eradicate them. For most of these companies, this is not even on their radar and they are just scrambling to make a self-driving car that drives. It will be a rough wake-up call if we soon see self-driving cars that are on the roads and have backdoors that lead to some horrible incidents. It will ruin the push toward self-driving cars. This is ripe for killing the golden goose for us all.
This content is originally posted on AI Trends.