How to Get High-Quality Training Data for Machine Learning

Sponsored by:

Recorded on December 12, 2018

To build an effective product that relies on machine learning, you need a large volume of high-quality training data. For the solution to correctly understand and mimic humans, it’s crucial to have a strategy around collecting and annotating training data that optimizes for quality. Join us to learn about the data you need to build solutions like natural language processing, chatbots, and sentiment analysis, with live Q&A to follow.

In this webinar you’ll learn:

  • The pros and cons of public data vs. building your own data sets
  • How much time and energy to invest in data collection
  • Why curated crowds yield higher-quality data for machine learning

James Lyle, PhD in Linguistics, Director of the Custom Linguistic Solutions team at Appen
After earning his Ph.D. in linguistics at the University of Washington, James joined Microsoft in 1999 and spent more than 14 years working on various natural language technologies, including proofing tools, information extraction, and text analytics. Since joining Appen in 2013, he has focused on providing tech industry clients with linguistic consultation and high-quality annotated data for machine-learned NLU solutions.


  • To view the Cambridge Innovations Institute’s privacy statement click here.