Data Scientist Guides AI-Driven Customer Engagement at Uber

On average, over 1,000 experiments are running on Uber’s Experimentation Platform at any given time. The platform enables Uber to launch, debug, measure, and monitor the effects of new ideas, product features, marketing campaigns, promotions, and even machine learning models.

Jeremy Gu has had an interesting career so far as a young person.

He was born in Northern China, moved to the US in 2009, attended the University of Minnesota where he majored in math and statistics as an undergrad; he then attended the University of Washington, earning a master’s in Statistics.

Jeremy Wenxiao Gu is a data scientist on the strategic finance team at Uber.

After graduation, he landed his first job at Amazon, in a position then called a machine learning scientist, and which Amazon today calls an applied scientist. He first joined the Amazon Web Services marketing analytics team. About a year later, he joined the Automated Advertising Data Science team of the retail business of Amazon.

In March 2017, he moved to San Francisco and became a senior data scientist on the Experimentation Platform team at Uber. Today he is a data science manager on the strategic finance team and leads a group of data scientists to work on strategy and planning at Uber.

At AI World Boston in December, 2018, Gu presented a case study on AI-driven customer engagement at Uber, using the launch of Uber Eats email campaigns in Europe as an example.

“What strategy works best for getting new eaters?” is the question he posed at the outset.

Customer targets are sent email promotions; conversion rates from open to “make an order” are tracked from several rounds in a continuous experiment. Data scientists call it the “multi-armed bandits,” with multiple tests running to find the most profitable action. “We use a scientific approach,” conducting experiments and changing the experiment configurations when collecting the data, he said.

“The industry trend is to move from AB testing to machine-learning-based continuous experiments,” he said, noting the bandit approach. Paraphrasing Carl von Clausewitz, Gu said, “It’s better to act quickly and make a mistake than it is to hesitate until it’s too late.” The bandits algorithm helps us to make a quick decision based on the data we collect over time.

Here is an excerpt from Gu’s Uber Engineering blog on the variety of statistics methodologies available today. Broadly, we use four types of statistical methodologies: fixed horizon A/B/N tests (t-test, chi-squared, and rank-sum tests), sequential probability ratio tests (SPRT), causal inference tests (synthetic control and diff-in-diff tests), and continuous A/B/N tests using bandit algorithms (Thompson sampling, upper confidence bounds, and Bayesian optimization with contextual multi-armed-bandit tests, to name a few). We also apply block bootstrap and delta methods to estimate standard errors, as well as regression-based methods to measure bias correction when calculating the probability of type I and type II errors in our statistical analyses.

A Challenge to Speak Both Data Science and Business Languages

A major challenge is to speak the languages of data science and business together. “We have a business problem and a data science problem,” Gu said. “To convert the business problem to a data science problem takes a lot of work and communication.”

For the Uber Eats email campaign, “We want to maximize the open rate and minimize the opportunity cost.”

Gu discovered that the best subject lines should have a question mark, and therefore the customers were more likely to engage and click on the email. On the other side, the subject lines with a “Thank you” note gave the low open rates, and the behavior scientists at Uber interpreted it as “thank you” means a stop of a conversation. Furthermore, one of the best subject lines turned out to be: “Delicious food delivered to you,” even though is it somewhat awkward English. “Sometimes, a miscommunication turns out to be one of the best messages, because the customers might open the email because they think there was a free food already delivered at the door while they didn’t order anything,” he said.

Then the team started focusing on conversion rates. They tried promotions of 10 percent off, $5 off and no promotion at all. The best turned out to be $5 off and free delivery.  

If the conversion rate is high, but the revenue is low per customer, that may not work for the business owner. So, “We look at the lifetime value” of the customer, to see what works best for the business, Gu said. In order to look at the customer value in the long-term, the team decided to use the metric of three to six months of actual booking fees.

The team found that campaigns with higher conversion rates had lower lifetime value. The email with no promotion, however, yielded as much lifetime value. “So maybe no promotion is the best to achieve long-term ROI, while the promotion email brings high conversion rates in the short-term,” Gu said.

The continuous experiments had a four-pronged strategy: content optimization, hyper-parameter tuning, spend optimization and an automated rollout. At the time of the talk, three had been completed.

“What we learned was that patterns change over time, and that predicting the future is hard,” Gu said.

As a lesson in business communication, “Both technical and business sides cherish people who speak bilingual,” between business and data science, Gu said. Quoting Frances Frei of the Harvard Business School, he said, “Data scientists need to spend more time with business teams.”

For more information about Jeremy, visit his Uber Engineering Blog.

— By John P. Desmond, AI Trends Editor