In the burgeoning field of computer science known as machine learning, engineers often refer to the artificial intelligences they create as “black box” systems: Once a machine learning engine has been trained from a collection of example data to perform anything from facial recognition to malware detection, it can take in queries—Whose face is that? Is this app safe?—and spit out answers without anyone, not even its creators, fully understanding the mechanics of the decision-making inside that box.
But researchers are increasingly proving that even when the inner workings of those machine learning engines are inscrutable, they aren’t exactly secret. In fact, they’ve found that the guts of those black boxes can be reverse-engineered and even fully reproduced—stolen, as one group of researchers puts it—with the very same methods used to create them.
In a paper they released earlier this month titled “Stealing Machine Learning Models via Prediction APIs,” a team of computer scientists at Cornell Tech, the Swiss institute EPFL in Lausanne, and the University of North Carolina detail how they were able to reverse engineer machine learning-trained AIs based only on sending them queries and analyzing the responses. By training their own AI with the target AI’s output, they found they could produce software that was able to predict with near-100% accuracy the responses of the AI they’d cloned, sometimes after a few thousand or even just hundreds of queries.
“You’re taking this black box and through this very narrow interface, you can reconstruct its internals, reverse engineering the box,” says Ari Juels, a Cornell Tech professor who worked on the project. “In some cases, you can actually do a perfect reconstruction.”