Uber’s Data Science Strategy: People, Product Lifecycle, Platformization

2843
The work of data scientists at Uber includes innovative data visualization techniques. This circular display shows rides by neighborhood. (UBER)

By Allison Proffitt, AI Trends Editorial Director

“Uber is making decisions in real time at global scale, while needing to take into account local nuances of the marketplaces,” explained Franziska Bell, Director, Data Science, Data Science Platforms, Uber. “And, of course, we also want to incorporate the user preferences on the product.”

As a result, Uber has invested heavily in data science, and Bell outlined some of Uber’s data science strategy last month at the AI World Conference & Expo in Boston.

Uber employs hundreds of data scientists working across the company, and Bell reports constant efforts to, “increase the innovation and speed with which these data scientists move.”

To speed up the rate of data science at Uber, the company has taken a dual approach: first to maximize each step of the existing data science project life cycle, and second to commoditize data science by creating platforms applicable to multiple use cases that are transferable and reusable.

Franziska Bell, Senior Data Science Manager, Uber

Perfecting the Data Science Project Lifecycle

Data science projects at Uber fall into four life cycle stages, Bell explained: data exploration, iterative prototyping, productization, and finally monitoring. Each step can be optimized. But first, Bell warned, you must start with the data.

“The way to think about data is like growing a garden,” Bell explained. “It needs constant attention and grooming. This is particularly important because otherwise data scientists need to over and over again deal with data fundamentals, potentially poor quality of data, discoverability issues, and many other things.”

For Uber—a company that was “born digital”—this may be a bit easier than for companies with longer histories, but Bell argues that investing in a solid data foundation offers compounding returns.

With a foundation of high-quality data, the first step is data exploration. Here Bell strongly recommends that data scientists perform product analysis even if there are data analysts on the team. This keeps the data scientists close to the business and user experience and helps data scientists and analysts work together as a team, she said.

The Uber data team has built a platform for surfacing and maintaining metadata at scale. One key component: Kepler.gl, an open source application that helps visualize large spatial/temporal datasets. “As you can imagine, at Uber we have a wealth of problems in the spatial/temporal domain.”

Next is iterative product development or prototyping, where best practices are essential, Bell explains. As a member of a “very nascent job family”, Uber has invested heavily in developing best practices, borrowing from engineering, Bell said. For example: code is checked by team members and technical documents are always reviewed by peers to ensure high reproducibility.

The team has built Data Science Workbench, an IPython notebook that is deeply integrated into the data stack and allows for version control and sharing, and Horovod, a distributed open source deep learning framework for TensorFlow that, Bell said, has broken new ground in training speeds for deep learning models.”

Third is productization. Uber heavily invests in “full stack data scientists” Bell explained, meaning data scientists who can also write production-level code. The skill combination enables transition from prototype to product with minimal hand-off errors, and lets the team identify viable algorithms early in the design phase.

In support of both prototyping and productization, the company also developed an in-house machine learning platform called Michelangelo, that lets users use off-the-shelf deep learning models. The platform also has a sandbox area where developers can bring and play with their own Python code using PyML.

Finally, Uber actively monitors the performance model. “This is a task that both developers as well as data scientists really don’t like,” Bell acknowledged. “Here the philosophy is making the right path the easiest path, and ideally automating as much as possible.”

Uber’s strategy of commoditization comes into play here: creating platforms for data science that anyone can use and that run autonomously or semi-autonomously, lessening the tedium of monitoring.

Strategy of Commoditization

The platform teams, or “data science ninjas” as Bell called them, are the first step in commoditization. These domain experts in anomaly detection, forecasting, conversational AI, computer vision, and experimentation work cross-functionally with their counterparts in product, engineering, and design to “commoditize” or “platformize” data science and deliver tools that can be used companywide.

The platform team chooses their areas of focus based on three questions: Is there a sufficient number of use cases across the company? Do each of these use cases offer a step function improvement to user experience? Will modules be transferable and reusable across use cases?

“We wisely choose our use cases strategically to enhance the platform with every single use case we take on, and reuse more and more of the platform along the way in order to build these “push-of-a-button”, completely automated platforms that can enhance decision making for internal stakeholders,” Bell explained.

At The Push Of A Button

She gave three examples of data science tools now developed to work with push-button ease. First, forecasting.

Forecasting has many use cases in Uber: forecasting market supply and demand, hardware capacity planning, and system (app) outage detection. A forecasting platform was developed that requires only historical data as an input.

This is particularly challenging, Bell explained, because forecasting methodologies vary so wildly from classical statistical approaches to machine learning algorithms, and each have their strengths and weaknesses. The Uber team has written award-winning forecasting methodologies. Slawek Smyl, an Uber data scientist developed a hybrid model that was named the winner of the M4 Competition, the latest edition of the renowned Makridakis (M) Competition, a challenge for which researchers develop ever more accurate time series forecasting models.

But the bottom line remains: “One really can’t forecast which one of these methods will work best on a given use case,” Bell explained, “and so one has to try out several different forecasting approaches.” Thus Uber has developed a parallel, language-extensible backtesting framework—Omphalos—that can scan different forecasting algorithms, both off-the-shelf and proprietary options.

Next among Bell’s platformization examples is natural language and conversational AI. One Click Chat is a recently-launched Uber machine learning product that allows drivers to more easily connect with riders. When a rider sends a text message to the Uber driver, One Click Chat algorithms understand the intent of the incoming message and suggest pre-determined responses so drivers can respond with one click.

“Now this all sounds very straightforward, but in practice there are quite a few challenges with this particular use case,” Bell explained. “Messages are very short, there are often abbreviations, misspellings, and autocorrect is not our friend.” She offered a real-world example where a rider sent a driver a message that read: “I’m Washington you.”

While the meaning is clear to human readers, the algorithm stumbled. “The algorithm needs to be able to tackle all of these things, and a typical frequency-based approach would not be able to handle this very well,” she said.

Uber incorporated a Google semantic understanding tool, trained it on anonymized user data, and the resulting algorithm was able to correctly interpret the “I’m Washington you” rider message.

Beyond One Click Chat Uber has many use cases for conversational AI. Hands-free dispatch will let drivers respond to ride requests with a verbal “yes” or “no”; voice reply will let drivers respond to rider texts verbally as well. Uber is also exploring conversational AI in its customer service department. The Customer Obsession Ticket Assistance tool understands incoming service tickets and makes recommendations to human customer service representatives.

Finally, Uber is working on the platformization of semi-automated data insights generation. Beyond just automating answer generation from queries, Bell wants to autogenerate questions as well.

“Why even wait until somebody asks a question and puts forth a hypothesis? This will always be limited by the number of people and the human hours we can have in this space. Why not have an algorithm that can scan through data and present interesting insights that can now be combined with the business acumen of our business partners as well as analysts. Really machine-assisted decision-making!”

Bell reports that they have already built and launched an early proof of concept in alpha phase. “We’ve gotten really massively positive responses on this front,” she said. “We’ve gotten feedback that the algorithm was able to make suggestions that humans hadn’t even thought about. I think we can be successful with this endeavor. This will completely revolutionize how we do data analytics and insights generation at Uber and, I think, also across the industry.”

Learn more at Franziska Bell’s LinkedIn page.