Ser-Nam Lim Describes Mission of Facebook Computer Vision Team in Talk at 2018 AI World

Ser-Nam Lim, Research Manager, Facebook Machine Learning

Ser-Nam Lim has an interesting job at Facebook. His title is Research Manager, Facebook Machine Learning. He’s on the computer vision team, which is broadly focused on achieving human-level understanding of images and videos, and his focus is on helping keep malicious video content off Facebook.

He certainly has the background for it, having earned a PhD in computer vision from the University of Maryland, and having worked at a range of companies including nine years at GE Global Research before moving to Facebook in April 2018.

On his LinkedIn page, Lim describes himself this way: ‘I lead’ research and development in areas of AI that span computer vision and machine learning topics. I have particularly worked in the area of computer vision since grad school and am passionate about the advances of computer vision in everyday products. As someone used to say, ‘We believe that people with passion can change the world for the better.’ “

At AI World 2018 in Boston, Lim, who runs teams in Boston and New York City, spoke in a session entitled, “Protecting the Facebook Platform.”

Facebook is being highly scrutinized, vilified by some, for how it’s being used to spread fake news. With a mission first to “connect the world,” (updated in June 2017 to “bring the world closer together”) and being the means of Internet access in many developing countries, Facebook has a serious challenge in its efforts to keep what its users put into its pipelines clean.

Lim and his team are working on overcoming those challenges. He described the evolution of computer vision. “Twenty years ago when I was a graduate student, I had a flip phone with no camera.”

At Facebook, the AI systems he helped develop process several billion images per day, a zillion per year. “It’s not even countable,” he said.  [Ed. Note: A zillion is an informal way to talk about a number that’s enormous but indefinite.] Lim added later, We have a responsibility to keep our platform safe, and at our scale, AI helps us do that. AI isn’t a cure-all, but it’s already enabled us to be more proactive and effective at dealing with inappropriate content.”  

Some of Facebook’s data centers are the biggest in the world. “Most of that data comes in via short film clips.”

Applying Extreme Vision

To protect the platform, Facebook is using “extreme vision,” going beyond the limits of normal vision. As part of this, Facebook is engaged in a major effort to understand why someone posted an image or a film clip.

“We want to understand the intent of someone posting an image,” Lim said.

AI is important to this. “We have invested a lot and we have deployed a wide variety of AI technologies,” Lim said. “We want AI to be pervasive in all our products.”

Protecting the platform means confronting attackers and bad actors who try to circumvent AI systems everywhere in the world. Many developing countries are using Facebook to access the Internet, because it is free. In countries with few newspapers, Facebook can be the primary media.

“Adversaries move very fast. It’s a cat and mouse problem,” Lim said. “We want to be a good citizen of the world.”

Facebook Willing to Share Software Innovations

Facebook developers are nimble also. For example when his team checks in a code fix, it goes live into production immediately. “It’s very fast.”

The team is leveraging PyTorch, an open-source machine learning library for Python, primarily developed by Facebook’s AI research group. PyTorch is based on Torch, an open-source machine learning library and scientific computing framework and is used for applications including natural language processing. Uber’s “Pyro” software for probabilistic programming is built on PyTorch.

Facebook is willing to share its software innovations in this area. “We are invested in detecting malicious content using AI and we want to make it available to others,” Lim said.

He described a volume of 20 billion images per day. “The model of ImageNet will not work at this scale,” said.

ImageNet is a large visual database designed for use in visual object recognition software research. Since 2010, ImageNet has hosted a competition featuring teams vying to achieve the highest accuracy on visual recognition tasks. Some 38 teams competed in 2017.

This is challenging stuff. Lim showed the AI World audience an image of a basketball game with a quote, “I’m going to beat you.” He showed a picture of a pig with a bow tie, with a quote, “Look at that pig.” Is it guacamole or wasabi? They look similar. Facebook uses image recognition to discern the difference.

“Computer vision provides the context. On the flip side, it could be malicious, and we want to filter it out. We want to be very precise in that,” he said.

Facebook has a lot of data that helps it detect what is true from what is not true, in order to block false statements from being posted. However, “We can’t train on every set of data. It’s a very challenging problem.”

Facebook has teams of human reviewers. “It’s never going to scale. You’re never going to catch up with a billion images. That’s why AI is so important. The investments Facebook began making in AI five years ago have led to increases in proactive detection of bad content, meaning ability to identify policy-violating content before its flagged by users. Today, rather than human reviewers spending all their time reviewing those cases, AI detection frees up their time to work on the harder and more nuanced problems.

Quest to Understand Intent

Understanding intent is a major area of research today for Facebook. It is focused on three areas: content, behavior and who is posting the content. The multi-modal analysis system takes text into account, as well as video and audio signals. For example, it is capable of finding communication related to human trafficking. “We can understand the dialogue,” he said.  

One approach Facebook is trying combines detection of an adversarial attitude with computer vision. “We combine the signals into a knowledge graph to make an inference on intent,” Lim said. He mentioned noise stream modeling, self-supervision and a provenance model as technologies being employed.

“We are able to take an impact that has been tampered with, compare it to an original image, and detect the tampering,” he said.

Facebook has deployed a large-scale machine learning system named Rosetta, which tries to understand text in images along with the context, in an effort to proactively identify inappropriate or harmful content. Many photos shared on Facebook and Instagram include text in various forms. The problem of understanding text in images is very different from recognizing characters but not understanding context.

Figure: Architecture of the text recognition model.

Rosetta extracts text from more than a billion public Facebook and Instagram images and video frames, in a variety of languages, daily and in real time. The system inputs it into a text recognition model trained on classifiers to understand the context of the text and image together.

The approach employs a range of AI technologies, including a convolutional neural network and an object detection network, in order to simultaneously perform detection and recognition.

Facebook has built a new way to display its detection system. “We’re trying to get to a knowledge graph, to have a global understanding of whether content coming into Facebook is malicious,” Lim said. “The Holy Grail for us is to fully understand the video that comes into Facebook.”

“The hope is to get to a human level of understanding. We are still not there yet,” Lim said.

Technologies mentioned included: ResNeXt: Aggregated Residual Transformations for Deep Neural Networks; pretrained models are available for use.

“There is no way to train an extreme model. So we use a weakly supervised approach using hash tags.” which have some “noisy data” in them.

“Our facial recognition software is one of the best in the world,” currently achieving 97.5 percent accuracy, he said. In 1991, facial recognition was at 60 percent accuracy.  

“Facial recognition has come a long way. We can recognize people and surrounding scenes very well. “

The team has built a platform it can use to train for new images. It is a self-service platform available to many Facebook developers to train on.

For more information, go to Facebook Machine Learning.

  • By John P. Desmond, AI Trends Editor