The Benefits of Using a GPU Database for AI and Machine Learning


It’s no secret that organizations are wrestling with the speed and scale of data generated by business apps and new data sources, such as social medias streams, sensors, and machine log data. Fortunately, machine learning has emerged as a viable strategy to discover actionable insights into your data by automatically uncovering patterns, anomalies, and relationships.

What’s the catch? It turns out that deploying machine learning within the enterprise isn’t so easy. Given the complex nature of machine learning workloads, you’ll need to master compute throughput data management, interoperability, elasticity, security, user experience, and operationalization.

Technologies such GPU analytics databases and deep learning frameworks such as TensorFlow are promising. In addition to these technologies, organizations today are demanding simpler converged turnkey solutions to deliver on the promise of machine learning.

In the recent webinar hosted by AI World, Bring AI to BI: The benefits of using a GPU database for AI and Machine Learning I explained how a GPU-accelerated database can help you deploy a scalable, cost-effective, and future-proof AI solution that enables data science teams to develop, test, and train simulations and algorithms while making them directly available on the same systems used by end users.

Here are some highlights from the webinar and here is link to the webinar itself:

GPU-enabled databases offer three main advantages over traditional RDBMS/NoSQL/In-Memory databases.

  1. Increased performance: GPUs can provide as much as 100x gains over traditional RDBMS/NoSQL/in-memory databases.
  2. Do more with less: GPUs provide increased performance, throughput, and capability with lower infrastructure costs.
  3. Many cores, many benefits: Modern GPUs consist of up to 3000-6000 cores compared to 16-32 in a standard CPU. These cores allow you to massively parallel process your data, meaning that traditional queries and analytics that take a long time can now be done in a fraction of the second on the GPU.

User-defined functions (UDFs) enable AI to run alongside BI and other advanced analytics on the same data platform. Simply put, UDFs bridge the gap between your data scientists and the business users who actually use the analytics. UDFs allow you to use any of your traditional API tools or traditional BI tools to go and query your functions. UDFs also make the parallel processing power of the GPU accessible to custom analytics functions. This opens the opportunity to use machine learning/artificial intelligence libraries such as TensorFlow, BIDMach, Caffe, and Torch that can run in-database alongside, and converged with, BI workloads.

Data can be stored in-memory and rapidly accessed by the Machine Learning (ML) model as necessary. Most ML models are trained on subsets of the raw data, and most do not retain this raw data. Instead, they use the raw data to learn a state (e.g., the strengths of various network connections) before disposing of it. With some GPU-accelerated databases, you can store data in-memory and use the ML model to quickly access it as necessary. One key advantage to having the data closely integrated means that the you can go back and fit your model as necessary.

GPU-accelerated databases make it easy to abstract the complexities involved with running code on a GPU. A GPU database can provide a number of different connectors to abstract the CUDA language. This means that if you are an existing Python, C++, or Java user, you can use a native API and gain access to the GPUs themselves.

You can leverage the parallel nature of GPUs to run queries in milliseconds. Some GPU-accelerated databases can work on billion+ row data sets and perform table scans in less than one hundred milliseconds.

Some GPU-accelerated databases are certified to work with the popular BI tools such as Tableau, Power BI, and MicroStrategy. You can simultaneously ingest, explore, analyze, and visualize your data within a Tableau server in a fraction of the time. All you have to do is configure Tableau to look at your GPU data source, and then you’re right in your comfortable Tableau world for analysis.

A geospatial and visualization pipeline is critical for performing advanced mapping and interactive location-based analytics. One of the core challenges with geospatial analytics is moving data from the database layer to the visualization layer. Serializing and moving millions to billions of objects from one technology to another takes time. A GPU-accelerated database can short-circuit this by keeping the data within its database, executing complex geospatial filters and advanced analytics, and rendering the geospatial data on the fly through its internal geospatial web server. With this type of architecture, you can filter and visualize large, complex geospatial vector data at high speed, without needing to move data from a database to a separate geospatial server layer.

To learn more and take a deeper dive into machine learning use cases, you can register for the upcoming webinar with Forrester Research: Introducing The AI Database: A Prerequisite to Operationalizing Machine and Deep Learning.

You can also read the blog How does a GPU Database Play in Your Machine Learning Stack? to gain a better understanding how Kinetica in particular fits in the ML stack.

– Karthik Lalithraj, Principal Solutions Architect, Kinetica