Walking the enormous exhibition halls at the recent RSA security conference in San Francisco, you could have easily gotten the impression that digital defense was a solved problem. Amidst branded t-shirts and water bottles, each booth hawked software and hardware that promised impenetrable defenses and peace of mind. The breakthrough powering these new panaceas? Artificial intelligence that, the sales pitch invariably goes, can instantly spot any malware on a network, guide incident response, and detect intrusions before they start.
That rosy view of what AI can deliver isn’t entirely wrong. But what next-generation techniques actually do is more muddled and incremental than marketers would want to admit. Fortunately, researchers developing new defenses at companies and in academia largely agree on both the potential benefits and challenges. And it starts with getting some terminology straight.
“I actually don’t think a lot of these companies are using artificial intelligence. It’s really training machine learning,” says Marcin Kleczynski, CEO of the cybersecurity defense firm Malwarebytes, which promoted its own machine learning threat detection software at RSA. “It’s misleading in some ways to call it AI, and it confuses the hell out of customers.”
Rise of the Machines
The machine learning algorithms security companies deploy generally train on large data sets to “learn” what to watch out for on networks and how to react to different situations. Unlike an artificially intelligent system, most of the security applications out there can’t extrapolate new conclusions without new training data.
Machine learning is powerful in its own right, though, and approach is a natural fit for antivirus defense and malware scanning. For decades AV has been signature-based, meaning that security companies identify specific malicious programs, extract a sort of unique fingerprint for each of them, and then monitor customer devices to ensure that none of those signatures appear.
Machine learning-based malware scanning works in a somewhat similar manner—the algorithms train on vast catalogues of malicious programs to learn what to look for. But the ML approach has the added benefit of flexibility, because the scanning tool has learned to look for characteristics of malware rather than specific signatures. Where attackers could stymie traditional AV by making just slight alterations to their malicious tools that would throw off the signature, machine learning-based scanners, offered by pretty much all the big names in security at this point, are more versatile. They still need regular updates with new training data, but their more holistic view makes a hacker’s job harder.
“The nature of malware constantly evolves, so the people who write signatures for specific families of malware have a huge challenge,” says Phil Roth, a data scientist at the machine learning security firm Endgame, that has its own ML-driven malware scanner for Windows systems. With an ML-based approach, “the model you train definitely needs to reflect the newest things that are out there, but we can go on a little bit of a slower pace. Attackers often build on old frameworks or use code that already exists, because if you write malware from scratch it’s a lot of effort for an attack that might not have a large payoff. So you can learn from all the techniques that exist in your training set, and then recognize patterns when attackers come out with something that’s only slightly new.”
Similarly, machine learning has become indispensable in the fights against spam and phishing. Elie Bursztein, who leads the anti-abuse research team at Google, notes that Gmail has used machine learning techniques to filter emails since its launch 18 years ago. But as attack strategies have evolved and phishing schemes have become more pernicious, Gmail and other Google services have needed to adapt to hackers who specifically know how to game them. Whether attackers are setting up fake (but convincing-looking) Google Docs links or tainting a spam filter’s idea of which messages are malicious, Google and other large service providers have increasingly needed to lean on automation and machine learning to keep up.
Read the source article in Wired.