Though everyone seems to be piling on the machine learning bandwagon, it’s a game that only the rich can play, as I’ve written. While open source machine learning projects like Google’s TensorFlow and Amazon’s DSSTNE lower the bar to would-be machine learning engineers, resolving the skills deficit that Gartner analyst Merv Adrian called the biggest hurdle to machine learning success, no amount of training can resolve a thornier issue: Lack of data.
Yandex, the Google of Russia, has plenty of data, coupled with experience wrangling it to machine learning success. It’s therefore fascinating to hear Alexander Khaytin, COO of its sister site Yandex Data Factory, talk through the best ways to bridge the data divide that keeps the vast majority of enterprises from achieving machine learning success.
But first, you’re going to need data. Lots of data.
Data, of course, is needed to train machine learning algorithms. Many companies simply don’t have the data assets necessary for such training. However, according to Khaytin, for the kinds of companies that undertake serious machine learning projects, volume of data isn’t the issue—getting it into one place is:
“While most companies undertaking machine learning projects inevitably own and store vast quantities of data, this data is not always ready to use. With data often siloed in separate storage and processing systems, the aggregation of data can be time-consuming and difficult. Additionally, when extracting data, companies must take data security into consideration with almost all data being “poisoned” by personal or sensitive kind of data.”
Compounding the problem, many organizations lack the willingness to experiment, a key component of machine learning, and are especially reluctant to do so on live, production systems. As he stated, “[W]hen it comes to prescriptive analytics, the measure of business impact can only truly be assessed by actually applying a machine learning model in the real business process. For most companies, often at the start of their digital transformation, the prospect of launching large scale machine learning projects which haven’t already demonstrated their value in previous trials can be daunting.”
Read the source article at Tech Republic.