Cutting the Cost of Big Data

At the International Symposium on Computer Architecture, researchers at Massachusetts Institute of Technology (MIT) presented a new system for common big-data applications that should make servers using flash memory as efficient as those using conventional RAM, while preserving the power and cost savings.

The researchers presented experimental evidence that showed, if the servers executing a distributed computation have to go to disk for data even 5% of the time, their performance falls to a level that is comparable with flash.

“This is not a replacement for DRAM [dynamic RAM] or anything like that,” says Arvind Mithal, the Johnson Professor of Computer Science and Engineering at MIT. “But there may be many applications that can take advantage of this new style of architecture. Which companies recognize: Everybody’s experimenting with different aspects of flash. We’re just trying to establish another point in the design space.”

The researchers were able to make a network of flash-based servers competitive with a network of RAM-based servers by moving some computational power off —MIT's new network design exploits flash memory. Source: wikipedia.comof the servers and onto the chips that control the flash drives. By preprocessing some of the data on the flash drives before passing it back to the servers, the chips can make distributed computation more efficient.

With the hardware contributed by companies such as Quanta, Samsung, and Xilinx, the researchers built a prototype network of 20 servers. Each server was connected to a FPGA. The FPGA was then connected to two 500-gigabyte flash chips and to the two FPGAs closest in the server rack.

Since the FPGAs were connected to each other, they created a network that allowed any server to retrieve data from any flash drive, while they also controlled the flash drives. The FPGAs also executed the algorithms that preprocessed the data stored on the flash drives. The researchers tested three such algorithms, geared for popular big-data applications: image searches, Google’s PageRank algorithm, and Memcached, which database driven websites use to store information that is frequently accessed.

“Many big-data applications require real-time or fast responses,” says Jihong Kim, a professor of computer science and engineering at Seoul National University. “For such applications, BlueDBM is an appealing solution.” Relative to some other proposals for streamlining big-data analysis, “The main advantage of BlueDBM might be that it can easily scale up to a lot bigger storage system with specialized accelerated supports,” Kim says.

Related Links:

MIT

Cutting the Cost of Big Data

Discussion – 0 comments

MEMORY AND STORAGE

MEMORY AND STORAGE

RELATED ARTICLES

RELATED ARTICLES