中文 |

Research Progress

Datasize-Aware High Dimensional Configurations Auto-Tuning for In-Memory Cluster Computing Optimized

Apr 17, 2018

In-memory cluster computing (IMC) has evolved into a popular paradigm for big data analytics because it runs 10~100 times faster than the on-disk cluster computing (ODC). However, the performance and configuration of an IMC program is sensitive to the size of input dataset.

Moreover, the number of performance-critical configuration parameters of IMC is typically larger than 40. The combination of data sensitivity and high dimensional configurations makes optimizing the performance of IMC programs extremely difficult.  

A research group led by Prof. YU Zhibin and Dr. BEI Zhendong from Shenzhen Institute of Advanced Technology (SIAT) of the Chinese Academy of Sciences optimized the data sensitive and high dimensional configurations for IMC programs.  

In their study, a hierarchical machine-learning based performance model was built as a function of configuration parameters and the size of input datasets. Experimental results showed that this model was much more accurate than traditional machine learning and statistical reasoning models such as random forest and response surface.

Based on the performance model, the researchers employed the genetic algorithm to search the best configuration for optimal performance of an IMC program. As a result, the proposed approach sped up the IMC programs configured default configurations by a factor of 30.4x on average and up to 89x.  

The paper entitled "Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing" was published in The 23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (2018)

This work is supported by the National Key Research and Development Program of China, National Natural Science Foundation of China and Outstanding Technical Talent Program of CAS. Additional support is provided by The Major Scientific and Technological Project of Guangdong Province, Shenzhen Technology Research Project, and Key Technique Research on Haiyun Data System of NICT.

Contact Us
  • 86-10-68597521 (day)

    86-10-68597289 (night)

  • 86-10-68511095 (day)

    86-10-68512458 (night)

  • cas_en@cas.cn

  • 52 Sanlihe Rd., Xicheng District,

    Beijing, China (100864)

Copyright © 2002 - Chinese Academy of Sciences