MIT’s Pentland Outlines Rules For Data And AI

There are several steps you can take using AI to ensure your data is secure. (GETTY IMAGES)

By Allison Proffitt

BOSTON—AI is data; data is AI. They’re really the same, Alex Pentland told a packed opening plenary session on the second day of the AI World Conference and Expo. “The winner is the person who has the most data. That’s probably not you, but you have friends who have data. They probably aren’t going to just give it to you; you have to figure out how to collaborate.”

Pentland directs the MIT Connection Science and Human Dynamics labs. He’s working on getting AI into the mainstream “peacefully”, he said. That is, no riots in the street, no mass unemployment, no cyber security onslaughts.

His group at MIT, sponsored by Ernst and Young, IBM, MasterCard, Orange, and others, builds pre-standards open source code—Kerberos, the network authentication protocol, for example—and tackles the big questions: How to ensure compliance, control risk, ensure privacy, and security? “Privacy is coming,” he warned, “not just in Europe and California, but everywhere.”

Alex Pentland, director of the MIT Connection Science and Human Dynamics labs

Pentland minces no words when it comes to security. “If you create a data lake, you should be fired,” he said. “You’ve just told the bad guys where to find the data!” He likened the approach to a medieval army packing a castle with all the arms and resources available, surrounding the castle with a moat, “and then someone leaves the gate open.” Seventy percent of all cyber attacks happen from human error, he said. If you put all your resources in one place you are doomed.”

Instead he highlighted a few key rules for using data as efficiently and securely as possible.

First, put a communications layer over your resources—an API—and share answers, not data, he advised. It sounds expensive and difficult, but really it’s pretty simple, he said. If someone wants something from you, take their request, process it yourself, and then give them the minimum answer possible.

The approach works just as well within large corporations. Getting data from one silo to another within a large organization can be impossible. But setting up an API that delivers answers is much easier. “This is a rapid accelerator!”

In addition, he recommended blockchain for tracking and auditing all of the data interactions: what was the query, who paid for it, what were their credentials, what was returned, etc. If you do this, Pentland said, you’ll be able to detect cyber attacks much more rapidly. The key thing with cyber attacks is moving data where it shouldn’t be—transactions that shouldn’t be happening. “If you have an unalterable log of all of the transactions, you can detect things that don’t fit very, very quickly.”

In fact, audit the data continually to look for cyber attacks, policies going wrong, and to assess the performance of your AI. “One of the weak things about most AI is that it doesn’t generalize very well. So if conditions change a little bit, it may run off the rails. You have to audit it continually,” Pentland explained. “You want to have a transaction record that says what it is that your AI is doing, and is that what you intended it to do?”

Finally, Pentland said you should never decrypt data. “You’ve heard of encryption at rest and encryption in transit,” he says, “but I’m telling you something more, which is don’t ever decrypt it. The moment you do someone will steal it.”

It turns out, data doesn’t need to be decrypted to be analyzed. There are a number of techniques that operate on encrypted data directly—some of which were developed by Pentland’s team. If your data stay encrypted, you don’t need a firewall, which means you can cache the data and realize 10x to 100x improvements in performance, he said.

In addition, Pentland argued that sharing encrypted data is not the same, legally, as sharing decrypted data. He asserts that encrypted data can be removed from a country when decrypted data couldn’t. Personal data, if encrypted, can be legally shared. “This changes the whole way you think about data,” he said

MasterCard uses the principle for fraud detection, he noted. Even though MasterCard can’t see the personal details about who is doing what, it can scan encrypted data for unusual repetitions or suspicious patterns and identify fraudulent activity.

“We are helping countries and companies around the world put up these sort of systems to modernize their data systems by providing this layer that gives access to more diverse sorts of data,” Pentland said. “It’s surprisingly easy to do. Not easy! But surprisingly easy.”