In an earlier post I explored the value of using scalable machine learning to extract value from huge amounts of data. In this post, I will dive down into the technical side of things, particularly the challenges and benefits that come with making algorithms scalable on large clusters of computers.
Machine learning algorithms are written to run on single-node systems, or on specialized supercomputer hardware, which I’ll refer to as HPC boxes. They grew up in a world where they didn’t have to scale across multiple nodes. It’s relatively easy to get high performance when running algorithms on a single computer. With distributed computing, things get a great deal harder for some algorithms due to the communications latencies among what could be thousands of server nodes.
So why not run your machine learning algorithms on specialized hardware? One reason: HPC boxes have their own challenges, one of which is that they run out of capacity. However, with a scalable architecture for big data analytics, you can add nodes to scale out as your capacity needs grow. Furthermore, the total cost of ownership for dedicated, custom hardware can be much higher than scale-out architectures built with commodity servers in private or public clouds.
Today, the trend is to run machine learning algorithms in scale-out clusters like those used for big data applications in cloud data centers. This shift to scalable infrastructure brings the added advantage of integrating your machine learning into the same environment as your other analytics workloads.