Hitachi, Ltd. has developed a database management system optimized for the high-speed embedded memory in the hardware and technology for high-performance parallel data processing in FPGAs. Using these technologies, the speed of data analytics was successfully increased by up to a maximum of 100 times compared with not using these technologies. Further, the two technologies developed were combined with Pentaho Business Analytics, a business analytics software developed by Pentaho Corporation (a Hitachi Group company), to visualize business analytics results, and with flash storage for data storage, to create a prototype real-time data analytics system. The prototype will contribute to realizing self-service data analytics, enabling employees in the field to easily and quickly execute data analytics on massive business data.
In recent years, self-service analytics that allow employees in the field to easily conduct big data analytics, usually conducted by experts such as data scientists, is gaining attention. A data analytics system for self-service data analytics needs to produce results quickly, and thus must have high processing capabilities to execute data read and data analysis processes. By using flash storage instead of a hard disk drive to store data, the data read performance was increased by up to 10 to 100 times. Data analysis performance, however, has been unable to keep up with data read performance, thus creating a bottleneck in the analytics.
To overcome this issue, Hitachi developed a database management system optimized for the high-speed embedded memory in the hardware (FPGA) and technology to conduct high-speed parallel data processing in the FPGAs, and succeeded in increasing data analytics speed by up to a maximum of 100 times. A real-time data analytics system prototype was then built by combining these two technologies with Pentaho Business Analytics for visualization of results, and flash storage for data storing.
The FPGA is equipped with small but high-speed internal memory (few MB) and connected to large but low-speed external memory (few GB). In the data format used in column-oriented or columnar databases, data management information that shows the location of data is larger than the internal memory needs to be stored in the external memory. This management information, however, is required to determine the location of the data. Thus storing this information on large but low-speed external memory slows down the processing speed. In this research, a database management system was developed where the database was subdivided into multiple data segments so that the management information of each data segments could be handled by the FPGA internal memory, stored in the flash storage and processed within the FPGA by each data segment. This database management system enables high-speed processing.
Parallel data processing is widely adopted to conduct high-speed processing. In column-oriented or columnar databases, however, this is difficult as the processing of one column must finish before the next column can be processed. To overcome this, a column processing method was developed to enable a set number of columns to be processed in turn. Parallel data processing was realized using this method, together with a data filter circuit to select the data for analytics, and an aggregation circuit to group the data and calculate values such as total or average, to realize parallel data processing.
Hitachi plans to exhibit these technologies at the Flash Memory Summit 2016, to be held August 9 through 11 in Santa Clara, California.