Altera Corp has said that Microsoft is using its Arria 10 FPGAs to host convolutional neural network (CNN) algorithms for such tasks as image classification, image recognition and natural language processing within data centers.
Altera said that Microsoft researchers have been using samples of Arria 10 FPGAs achieving performance levels of 40-GFLOPS per watt, about the three times the performance-per-power achieved when running CNNs on general-purpose GPUs. Microsoft has been using OpenCL, or VHDL, to code the Arria 10 FPGAs with their on-chip floating-point DSP blocks.
Microsoft has been working with Altera for some time on the use of FPGAs to produce more energy-efficient hardware for the software-defined data center (see Microsoft, Bing Bet on Programmable Logic for Servers). This was the Catapult project, which demonstrated an effort to accelerate Bing Ranking by a factor of nearly two using FPGAs in the datacenter. Microsoft Research has now developed a high-throughput CNN FPGA accelerator that achieves excellent performance while consuming a small fraction of server power.
Top-level architecture of convolutional neural network accelerator. Source: Microsoft Research.
According to a research paper from Microsoft Research its CNN FPGA accelerator engine is characterized by three features: (1) a software configurable engine that can support multiple layer configurations at run-time (without requiring hardware re-compilation), (2) an efficient data buffering scheme and on-chip re-distribution network that minimizes traffic to off-chip memory, and (3) a spatially distributed array of processing elements (PEs) that can be scaled easily up to thousands of units.
Microsoft's efforts in machine learning can be used to target such services as Bing, Cortana, One Drive, Skype Translator, and Microsoft Band.
"We are seeing a significant leap forward in CNN performance and power efficiency with Arria 10 engineering samples," said Doug Burger, director of client and cloud apps at Microsoft Research. "The FPGA has an architectural advantage for neural algorithms with the ability to convolve and do pooling very efficiently with a flexible data path which enables many OpenCL kernels to pass data directly to each other without having to go to external memory," said Michael Strickland, director of the compute and storage business unit at Altera. "Arria 10 has an additional architectural advantage of supporting hard floating point for both multiplication and addition – this hard floating point enables more logic and a faster clock speed than traditional FPGA products."
Questions or comments on this story? Contact firstname.lastname@example.org
Related links and articles: