The First Announcementof 2012 Summer Course on Monte Carlo Methods and High-Performance Computing in Beijing

2012 Summer Course on Monte Carlo Methods and High-Performance Computing, organized by Supercomputing Center of China Academy of Sciences(SCCAS), Stony Brook University of USA(SBU), and the Sino-American Joint Center for Computational Sciences(SAJCCS), will be held on May 14th-18th,2012, in Beijing, China.

The history of high-performance computing (HPC) is full of continuous innovation incomputer architecture, HPC software and HPC algorithms.Yet, one constant in the entire course of HPC's history is that memory bandwidth has always been costly and ultimately inadequate for communication-bound algorithms. This was as true for the Cray-1 as it is in the elite machines leading the latest Top500 list. Thus, a tremendous amount of work has been put into tuning the communication of algorithms to the architectures of the day, and in the development of tools for doing this while exploiting the latest architectural features.This has included (in roughly chronologicalorder) vector processors, shared-memory MIMD machines, distributed-memory MIMD machines, massively parallel SIMD machines, and now the various hybrid architectures. The most recent group hybrid machines include those with multi-core processors, those with computational accelerators, and those with both.

The key to exploiting the parallelism on these architectures is take the existing parallelism in the underlying algorithms and use that to map computational work onto the various architectures. The problem has never been a lack of parallelism;it has been how to effectively handle therequired data communication tomap the algorithms effectively. This has always required very sophisticated implementation from very literate programmers. However, regardless of the architectures or the algorithms, if data distribution necessary for parallelism required communication, one always runs into a lack of scalability as follows. Given a fixed problem size, if one increases the number of processors, the running time will initially decrease until communication overhead swamps the architecture's bandwidth capabilities. In fact, the situation where a fixed problem size is given is quite common, as this corresponds to the desire to take an existing computation and efficiently utilize access to a larger and more powerful machine. Because of this, the question of scaling is often explored with either a very large instance of a problem, so large that the parallel communication will not overwhelm the machine's bandwidth, or one also scales the problem size as one considers moving onto more and more processors. Thus, the only solution to this issue in a practical sense is to increase the system's communication bandwidth. However, as mentioned above, this bandwidth is very costly, and scales at a much slower rate than processing speed has been scaling.

Thus, there is no general solution to doing parallel processing with algorithms that require communication to manage distributed data on a multiprocessor architecture. While this is indeed a gloomy situation, there is a very general class of algorithms that fundamentally avoids these issues: the naturally parallel algorithms. Some call these embarrassingly parallel algorithms, but their desirable properties are no reason for embarrassment. One such family of methods that are generally naturally parallel is Monte Carlo algorithms. Monte Carlo methods are numerical algorithms that use statistical sampling to compute various quantities of interest. They are fundamentally different than the deterministic numerical algorithms that are commonly used, but they also can be used to solve a wide variety of problems.

In general, Monte Carlo methods compute a quantity of interest by first expressing it as the expected value of some function (random variable) sampled via a stochastic process. Thus, the natural parallelism comes about through the statistically independent sampling. Each processor computes different samples of the quantity of interest, and at the end of a long computation need only report a sample mean and variance. Thus, this standard decomposition requires almost no communication, and only asynchronous communication at that. However, the key to being able to exploit many processors effectively with Monte Carlo is that the statistical samples computed must be statistically independent from one another. This is not a trivial requirement, and falls to the random number generators used on the different processors.

This Summer Course was developed to teach students to understand how modern computer architectures can take advantage of the wide class of Monte Carlo algorithms available for different numerical computations. To do this, we have divided the course into three pieces. The first is an overview of HPC hardware and currently popular software tools. The second is an overview of Monte Carlo methods with emphasis on algorithms for numerical integration, solving partial differential equations, and numerical linear algebra. The last part is a treatment of parallel random number generation.

Course Instructors:

*Prof. Michael Mascagni, a world-class expert in Monte Carlo methods who is fromDepartment of Computer Science of Florida State University (

*Prof. Yuefan Deng, a well-known expert in parallel computing and applications, is a professor of Applied Mathematics of Stony Brook University. He completed aPh.D. in Theoretical Physics of Columbia University. He is interested in supercomputer design and applications in life and physical sciences(

Deadline and how to apply

Please send your information(Name, CompanyName, Title, Research area, E-mail, Telephone/Mobile) to foreApril 27th,Friday,2012.


RMB¥600yuanincluding tuition, copies of teaching materials, practicecosts in DeepComp7000 100Teraflops cluster,and lunch daily.

For more information, please contact:

Dr. Zhongjin Ms. Qing Zhao

Phone: +86.010.58812126 Phone: +86.010.58812128

E-mail: E-mail: