The recent emergence of a powerful machine-learning technique known as deep learning has made computing giants such as Google, Facebook, and Microsoft even hungrier for data. It’s what lets software learn to do things like recognize images or understand language.
Shmatikov and researchers at Microsoft and Google are all working on ways to get around that privacy problem. By providing ways to use and train the artificial neural networks used in deep learning without needing to gobble up everything, they hope to be able to train smarter software, and convince the guardians of sensitive data to make use of such systems.
Shmatikov and colleague Reza Shokri are testing what they call “privacy-preserving deep learning.” It provides a way to get the benefit of multiple organizations—say, different hospitals—combining their data to train deep-learning software without having to take the risk of actually sharing it.
Each organization trains deep-learning algorithms on its own data, and then shares only key parameters from the trained software. Those can be combined into a system that performs almost as well as if it were trained on all the data at once.
The Cornell research was partly funded by Google, which has published a paper on similar experiments and is talking with Shmatikov about his ideas. The company’s researchers invented a way to train the company’s deep-learning algorithms using data such as images from smartphones without transferring that data into Google’s cloud.
That could make it easier for the company to leverage the very personal information held on our mobile devices, they wrote. Google didn’t respond to requests to make someone available to discuss that research, but Shmatikov believes the company is still working on it.