Artificial Intelligence (‘AI’) is a runaway success and we think it is going to be one of the biggest, if not the biggest driver of future economic growth. There are major AI breakthroughs on a fundamental level leading to a host of groundbreaking applications in autonomous driving, medical diagnostics, automatic translation, speech recognition and a host more.
See for instance, in the figure above, the acceleration in speech recognition in the last year or so.
We’re only at the beginning of these developments, which is going in several overlapping stages:
- Improvements in big data and neural networks.
- The use of big data in the cloud.
- The use of GPUs for machine learning in the cloud.
- The development of specialist AI chips.
We have described the development of specialist AI chips in an earlier article, where we already touched on the new phase emerging – the move of AI from the cloud to the device (usually the mobile phone).
This certainly isn’t a universal movement but involves inference (the application of the algorithms to answer queries), rather than the more computing-heavy training (where the algorithms are improved through iteration rounds with the help of massive amounts of data).
Since GPUs weren’t designed with AI in mind, so in principle, it isn’t much of a stretch to assume that specialist AI chips will take performance higher, even if Nvidia is now designing new architectures like the Volta with AI in mind at least in part, from Medium:
Although Pascal has performed well in deep learning, Volta is far superior because it unifies CUDA Cores and Tensor Cores. Tensor Cores are a breakthrough technology designed to speed up AI workloads. The Volta Tensor Cores can generate 12 times more throughput than Pascal, allowing the Tesla V100 to deliver 120 teraflops (a measure of GPU power) of deep learning performance… The new Volta-powered DGX-1 leapfrogs its previous version with significant advances in TFLOPS (170 to 960), CUDA cores (28,672 to 40,960), Tensor Cores (0 to 5120), NVLink vs. PCIe speed-up (5X to 10X), and deep learning training speed (1X to 3X).
However, while the systems on a chip (SoC) that drive mobile devices contain a GPU processor, these are pretty tiny compared to their desktop and server equivalents. There is room here too for adding intelligence locally (or, as the jargon has it, ‘on the edge’).
Read the source article at Seeking Alpha.