As is usually the case with fast-advancing technologies, AI has inspired massive FOMO (fear of missing out), FUD and feuds. Some of it is deserved, some of it not — but the industry is paying attention. From stealth hardware startups to fintech giants to public institutions, teams are feverishly working on their AI strategy. It all comes down to one crucial, high-stakes question: How do we use AI and machine learning to get better at what we do?
More often than not, companies are not ready for AI. Maybe they hired their first data scientist to less-than-stellar outcomes, or maybe data literacy is not central to their culture. But the most common scenario is that they have not yet built the infrastructure to implement (and reap the benefits of) the most basic data science algorithms and operations, much less machine learning.
As a data science/AI adviser, I had to deliver this message countless times, especially over the past two years. Others agree. It’s hard to be a wet blanket among all this excitement around your own field, especially if you share that excitement. And how do you tell companies they’re not ready for AI without sounding (or being) elitist — a self-appointed gate keeper?
Here’s an explanation that resonated the most:
Think of AI as the top of a pyramid of needs. Yes, self-actualization (AI) is great, but you first need food, water and shelter (data literacy, collection and infrastructure).
Basic needs: Can you count?
At the bottom of the pyramid we have data collection. What data do you need, and what’s available? If it’s a user-facing product, are you logging all relevant user interactions? If it’s a sensor, what data is coming through and how? How easy is it to log an interaction that is not instrumented yet? After all, the right data set is what made recent advances in machine learning possible.
Next, how does the data flow through the system? Do you have reliable streams / ETL ? Where do you store it, and how easy is it to access and analyze? Jay Kreps has been saying (for about a decade) that reliable data flow is key to doing anything with data.
Only when data is accessible, you can explore and transform it. This includes the infamous ‘data cleaning’, an under-rated side of data science that will be the subject of another post. This is when you discover you’re missing a bunch of data, your sensors are unreliable, a version change meant your events are dropped, you’re misinterpreting a flag — and you go back to making sure the base of the pyramid is solid.
Read the source article in Hacker Noon.