The CC-NUMA Project: Computational Chemistry on Non-Uniform Memory-access Architectures
(Phase II)
Phase II will be funded by the ARC Linkage Grant LP0774896 over the
years 2007-2009. It title is:
Programming Paradigms, Tools and Algorithms for Electonic Structure
Calculations on Clusters of Non-Uniform Memory Parallel Processors
Its focus will be on clusters of Opteron mulitcore SMP
nodes - due to their HyperTranspost-based architecture, these nodes
have strong non-uniform memory access (NUMA) characteristics. Both
Solaris and Linux operating systems are of interest. An example of such
a cluster is
the 10,480 CPU cluster recently installed at the Tokyo Institute of
Technology: the work of this project will have direct relevance to
the TITech system and future systems based on this. Sun Microsystems
will provide to ANU a small-scale clusters for development purposes and
access to larger-scale clusters at APTSC.
The project supports one Postdoctoral Fellow (Dr Rui Yang) and one
APAI_IT PhD scholar (Mr Danny Robson). Its activities will include visits to
APTSC and Gaussian, visits by Dr See to ANU and an internship for the
PhD scholar at APTSC.
Abstract:
Cluster computers have emerged as the platform of choice for scientific
computing. The next generation of these systems will have "fat" nodes
that utilize the latest multi-core processor technology. This project
will develop software tools and methods for these systems, with the aim
of enabling a more productive utilization of these architectures for
scientific computation. Our focus is on electronic structure methods,
particularly those methods where the cost of the computation scales
linear with the number of atoms in the system. Our goal is to develop
scalable parallel implementations of these methods that can be used to
perform computations on nanoscale systems, such as enzymes and molecular
electronic devices.
Statement of Benefit:
In recent years Australian academia has invested heavily in high
performance computing systems. A significant fraction of these resources
are devoted to performing computational chemistry studies, such as those
used in drug design. This project links Australian researchers with the
company responsible for a particularly widely used computational
chemistry application package, and also with a major international
computer company. Our aim is to substantially improve the performance of
this code on cluster based compute systems. This, as well as our generic
performance evaluation tools, would be of substantial benefit to the
Australian research community. The project will forge links with
researchers in Singapore, Japan and the USA.
Keywords:
Cluster Computers, Parallel Algorithms,
Shared memory parallel computing with OpenMP,
Computer Performance Evaluation, Computer Simulation,
Computational Chemistry Linear Scaling Methods
The project's objectives include:
- Significant advances in the development of electronic structure
algorithms, particularly those that scale linearly, for cluster computer
systems where each node on the cluster is a multiprocessor NUMA system.
- The ability to predict the effect of architectural and operating system
changes on the performance of a complex application, and thereby give
guidance to the hardware and software groups within Sun as to how they
may want to change their products.
- The creation of new or extended simulations tools that can be used by
other research groups (including our industrial partner Sun
Microsystems) seeking to model the performance of complex applications
on multiprocessor Opteron clusters.
As for Phase I, the project consists of two themes: Electronic
Structure Algorithms for NUMA Clusters, and Modeling Performance on
Multi-core NUMA Platforms.
In brief, the algorithm development theme involves extending work from
Phase I in NUMA Performance Analysis and Cache
Modeling to gather data on thread and memory placement, using tools such
as Performance counter libraries and the Valgrind simulation tool. It also
involves identifying the various drivers for different algorithmic
transitions, and were they should occur when running the sequential
code, the OpenMP parallel code
within one node, and the Linda/OpenMP (or alternative
hybrid programming model, such as MPI/OpenMP) code over the entire cluster.
This information will then be used in conjunction with the simulation
techniques outlined below to develop performance models that can be used
to predict performance as a function of cluster characteristics. In this
way, key electronic structure algorithms will be optimized for
NUMA-based multicore cluster.
In brief, the Performance Modelling theme involves firstly constructing
and validating simulator models for multicore/multiprocessor Opteron
nodes. These will be based on the state-of-the-art Valgrind / Cachegrind simulator
infrastructure (although AMD's SimNOW! infrastructure may also be
considered). Drawing on the experiences from the simulator development from Phase
I, this work will involve firstly constructing detailed models of
the Opteron HyperTransport-linked memory system, an accurate `cycle
counter' module for the x86-64 CPU, and modifying Valgrind's scheduling
to support multiple threads/CPUs. Secondly, it will involve extending
this infrastructure to the cluster level, which similarly involves
constructing performance models of the communication network and
scheduling at the cluster level. The latter offers the interesting
prospect of parallelization.
Project Topic: Efficient and Accurate Simulation of Multi-core
NUMA-based Cluster Systems (see description of the Performance Modelling
theme above)
See the links above, and also:
- The Beowulf Cluster Computer Project
- J. Antony, M.J. Frisch and A.P. Rendell,
Modeling the Performance of the Gaussian Computational Chemistry
Code on x86 Architectures,
International Conference on High Performance Scientific Computing, Hanoi,
Vietnam March 2006.
-
SimNow!: Fast Platform Simulation Purely In Software
- L. Dagum, R. Menon, OpenMP: An industry standard API for
shared-memory programming , IEEE Comput. Sci. and Eng., 5, 46
(1998).
- Cachegrind: See Nicholas Nethercote, Dynamic Binary Analysis and
Instrumentation, PhD thesis, University of Cambridge, Nov, 2004.
- Andrew Over, Peter Strazdins and Bill Clarke, Cycle Accurate Memory Modelling:
A Case-Study in Validation , Proceedings of the IEEE
International Symposium on Modeling, Analysis, and Simulation
(MASCOTS'05), pages 85-94, Atlanta, September 2005.
- Peter Strazdins, Bill Clarke and Andrew Over, Efficient
Cycle-Accurate Simulation of the UltraSPARC III CPU, CRPITS '07:
Proceedings of the Thirtieth Australasian Conference on Computer
Science, Ballarat, Australia, January 2007.
- Andrew Over, Bill Clarke and Peter Strazdins
A Comparison of Two Approaches to Parallel
Simulation of Multiprocessors,
2007 IEEE International Symposium on Performance Analysis of Systems and
Software (ISPASS'07), San Hose, April, 2007.
- D.A. Grove and P.D. Coddington, Analytical Models of Probability
Distributions for MPI Point-to-Point Communication Times on Distributed
Memory Parallel Computers, LNCS, 3719, 406-415 (2005).
- the
Accurate Performance Modelling and Prediction of Cluster Computers project,
part of the Jabberwocky Project
last modified: Peter Strazdins 09/2007