The CC-NUMA Project: Computational Chemistry on Non-Uniform Memory-access Architectures (Phase II)

Phase II will be funded by the ARC Linkage Grant LP0774896 over the years 2007-2009. It title is: Its focus will be on clusters of Opteron mulitcore SMP nodes - due to their HyperTranspost-based architecture, these nodes have strong non-uniform memory access (NUMA) characteristics. Both Solaris and Linux operating systems are of interest. An example of such a cluster is the 10,480 CPU cluster recently installed at the Tokyo Institute of Technology: the work of this project will have direct relevance to the TITech system and future systems based on this. Sun Microsystems will provide to ANU a small-scale clusters for development purposes and access to larger-scale clusters at APTSC.

The project supports one Postdoctoral Fellow (Dr Rui Yang) and one APAI_IT PhD scholar (Mr Danny Robson). Its activities will include visits to APTSC and Gaussian, visits by Dr See to ANU and an internship for the PhD scholar at APTSC.

Project Summary

Abstract: Cluster computers have emerged as the platform of choice for scientific computing. The next generation of these systems will have "fat" nodes that utilize the latest multi-core processor technology. This project will develop software tools and methods for these systems, with the aim of enabling a more productive utilization of these architectures for scientific computation. Our focus is on electronic structure methods, particularly those methods where the cost of the computation scales linear with the number of atoms in the system. Our goal is to develop scalable parallel implementations of these methods that can be used to perform computations on nanoscale systems, such as enzymes and molecular electronic devices.

Statement of Benefit: In recent years Australian academia has invested heavily in high performance computing systems. A significant fraction of these resources are devoted to performing computational chemistry studies, such as those used in drug design. This project links Australian researchers with the company responsible for a particularly widely used computational chemistry application package, and also with a major international computer company. Our aim is to substantially improve the performance of this code on cluster based compute systems. This, as well as our generic performance evaluation tools, would be of substantial benefit to the Australian research community. The project will forge links with researchers in Singapore, Japan and the USA.

Keywords: Cluster Computers, Parallel Algorithms, Shared memory parallel computing with OpenMP, Computer Performance Evaluation, Computer Simulation, Computational Chemistry Linear Scaling Methods

Further Details of the Project

The project's objectives include: As for Phase I, the project consists of two themes: Electronic Structure Algorithms for NUMA Clusters, and Modeling Performance on Multi-core NUMA Platforms.

In brief, the algorithm development theme involves extending work from Phase I in NUMA Performance Analysis and Cache Modeling to gather data on thread and memory placement, using tools such as Performance counter libraries and the Valgrind simulation tool. It also involves identifying the various drivers for different algorithmic transitions, and were they should occur when running the sequential code, the OpenMP parallel code within one node, and the Linda/OpenMP (or alternative hybrid programming model, such as MPI/OpenMP) code over the entire cluster. This information will then be used in conjunction with the simulation techniques outlined below to develop performance models that can be used to predict performance as a function of cluster characteristics. In this way, key electronic structure algorithms will be optimized for NUMA-based multicore cluster.

In brief, the Performance Modelling theme involves firstly constructing and validating simulator models for multicore/multiprocessor Opteron nodes. These will be based on the state-of-the-art Valgrind / Cachegrind simulator infrastructure (although AMD's SimNOW! infrastructure may also be considered). Drawing on the experiences from the simulator development from Phase I, this work will involve firstly constructing detailed models of the Opteron HyperTransport-linked memory system, an accurate `cycle counter' module for the x86-64 CPU, and modifying Valgrind's scheduling to support multiple threads/CPUs. Secondly, it will involve extending this infrastructure to the cluster level, which similarly involves constructing performance models of the communication network and scheduling at the cluster level. The latter offers the interesting prospect of parallelization.

APAI_IT PhD Scholarship

Project Topic: Efficient and Accurate Simulation of Multi-core NUMA-based Cluster Systems (see description of the Performance Modelling theme above)

Key References

See the links above, and also:
last modified: Peter Strazdins 09/2007