Among ways big data can go wrong: Garbage in, garbage out

352

Big data is enjoying major popularity at the moment, with algorithms and machine learning at the core of almost every business application. This cutting edge technology utilizes massive datasets to run increasingly complex algorithms to make decisions with far reaching consequences. Big data culture is becoming the norm as companies strive to acquire the business intelligence that comes as a result of predictive models and statistical analyses.

There is great value in this as it allows companies to draw conclusions and use prescriptive statistics to make intelligent business decisions. But at what point does the data start controlling the business user instead of the business user controlling the data? People seem to accept the power of big data at face value because if it was spewed out of a machine, it must be right. Right? Wrong!

There are inherent errors and weaknesses in most analytical models, as proven by Kurt Gödel with his incompleteness theorems, which predict that every formal system eventually fails. And unfortunately with big data, the scope of failure is correspondingly larger.

Here are the three most common underlying causes for issues with big data.

Phantom data

Most numbers we deal with in daily decision making, come from massive databases and have been analysed through complex analytical processes before we eventually see them. At first glance there is no way to tell if these numbers are accurate.

In most cases, the original numbers are punched into a machine on the manufacturing shop floor by front-line employees. Thus, the input data is subject to human error. At the front end, cashiers are still responsible for ringing up the correct bar codes, and stock personnel are still responsible for counting and placing stock correctly. We haven’t outsourced these necessities to machines yet and consequently, errors in this stage of the process can result in larger discrepancies further up the line and result in inappropriate purchasing and marketing decisions.

Read the source article at CIO.com.