GPU-PERFORMANCE EVALUATION WITH

MOLECULAR DYNAMICS

(NAMD SOFTWARE)

By AJAY PASAGADA

GPU AND ITS EVOLUTION:

In the 1980s, hardware companies began producing dedicated accelerator hardware components to handle the manipulation and display of 2D graphics in order to load computationally complex graphical calculations from the CPU. These graphics cards" diered from multipurpose CPUs by focusing solely on the generation ofcomputer graphics, sacricing computational exibility for higher graphical performance.As computing continued to advance, programmers began creating computerprograms that were able to manipulate 3D models for display on a standard monitor.

Over time, the competition between dierent vendors led to large improvements

in graphics cards that resulted in increased performance as well as decreased cost.Graphics cards began implementing more graphical functions in hardware and, with the introduction of the GeForce 256 in 1999, NVIDIA introduced the GPU to computing,dening a GPU as \a single-chip processor with integrated transform, lighting,triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second.

Computer Graphics Technology and Parallel Computations:

One of the aspects of computer graphics that greatly inufenced the diff
erent design approaches taken by the developers of GPUs is the inherent parallel nature of displaying computer graphics. When creating a 3D image for display, a GPU performs a variety of computations on a set of graphics primitives that are highly data independent.

GPUs are optimized primarily for performing these types of computations.

Since a GPU will focus primarily on performing a relatively small set of operations on a specic set of data points, GPU makers focus on creating hardware that specializes in these tasks instead of a wide array of diff
erent tasks, as is the case with CPU designs.

This places a limitation on the exibility of GPUs, but allowsthem to perform their specialized tasks much more e ciently. Therefore, GPUs focus on the utilization of a large number of parallel processors dedicated to performing graphical computations, whereas CPUs focus more on performing a much smaller number of generic calculations very quickly. Though CPUs have become more parallel in recent years, they still do not compare to the parallelism of current generation GPUs.

HOW DOES IT WORK(simply):

Like a motherboard, a graphics card is a printed circuit board that houses a processor and RAM. It also has an input/output system (BIOS) chip, which stores the card's settings and performs diagnostics on the memory, input and output at startup. A graphics card's processor, called a graphics processing unit (GPU), is similar to a computer's CPU. A GPU, however, is designed specifically for performing the complex mathematical and geometric calculations that are necessary for graphics rendering. Some of the fastest GPUs have more transistors than the average CPU. A GPU produces a lot of heat, so it is usually located under a heat sink or a fan.

In addition to its processing power, a GPU uses special programming to help it analyze and use data. ATI and nVidia produce the vast majority of GPUs on the market, and both companies have developed their own enhancements for GPU performance­ Each company has also developed specific techniques to help the GPU apply colors, shading, textures and patterns.

As the GPU creates images, it needs somewhere to hold information and completed pictures. It uses the card's RAM for this purpose, storing data about each pixel, its color and its location on the screen. Part of the RAM can also act as a frame buffer, meaning that it holds completed images until it is time to display them. Typically, video RAM operates at very high speeds and is dual ported, meaning that the system can read from it and write to it at the same time.

GPU IN THE FIELD OF MOLECULAR DYNAMICS:

Molecular dynamics simulations are used to track the evolution of a system of particles based on the interactions between them. It is used in physics, biology, material sciences, applied mathematics and chemistry where systems of up several million atoms are simulated for weeks or months prior to completion. Because of the exceptionally long compute time of MD simulations, it is a popular target for acceleration both using traditional high performance computing techniques, as well as novel architectures .

The programmable graphics processor has shown considerable promise for its use in compute-intensive simulations.Their many-core SIMD design is well-suited to numericaland probabilistic simulations such as Monte Carlo and molecular dynamics . With their computational power far outpacing that of the typical CPU, and with therecent addition of double precision floating point hardware to GPUs, it is expected that they will become a standard tool for high performance computing and application acceleration.

Despite the computational power offered by modern graphics processors, they have traditionally been limited to the graphics domain in large part due to their lack of programmability. Until recently, GPUs could only be programmed through the graphics API, such as DirectX or OpenGL. It is used in physics, biology, material sciences, applied mathematics and chemistry where systemsof up several million atoms are simulated for weeks or months prior to completion. Because of the exceptionally long compute time of MD simulations, it is a popular target for acceleration both using traditional high performancecomputing techniques, as well as novel architectures .

The programmable graphics processor has shown

considerable promise for its use in compute-intensive simulations.

Their many-core SIMD design is well-suited to numerical and probabilistic simulations such as Monte Carlo and molecular dynamics. With their computational power far outpacing that of the typical CPU, and with the

recent addition of double precision floating point hardwareto GPUs, it is expected that they will become a standard tool for high performance computing and application acceleration.

GPU-Optimized Molecular Dynamics Simulation Performances:

The use of CUDA and NVIDIA's GPUs in performing molecular dynamics simulations has led to a large degree of improvement over a strictly CPU-based approach. Though the entire process has many portions of code that must be run one after another, this does not prevent the performance of the GPU from being utilized in order to produce very efficient code. Each of the different sections of the molecular dynamics algorithm has diff erent characteristics that lend themselves to varying degrees of parallelization.

OVERALL IMPROVEMENTS:

When all of the improvements described above are applied to the SOP model, the total performance of the application greatly improves when simulating molecules above a certain size. Figure above shows the execution time for a one million timestep simulation of ve different biomolecules using CPU and GPU code with both double and single precision.

The run times of the simulations decrease dramatically when run on the GPU in the case of all of the biomolecules except for the tRNAphe unit which consists of only 76 total beads. The low number of beads prevents the parallel nature of the GPU from being fully utilized and any performance benets that the GPU may give are cancelled out by the amount of overhead introduced in launching and managing the CUDA kernels from the CPU.

However, with biomolecules over a certain size, the parallel nature of the GPUcan be taken advantage of more effectively. The overhead of launching and managing kernels becomes negligible compared to the execution time of the kernel, which will be much faster than an equivalent CPU-based calculation. Starting with the 16s sub unit, consisting of 1,530 beads, the GPU code runs much faster than the CPU code.

This behavior continues through the largest biomolecule studied for this thesis, the70s ribosome, consisting of 10,219 beads.To quantify the performance difference between the CPU and GPU code, we observe

the speedup of the GPU code, dened as TimeCPU=TimeGPU, where TimeCPUand TimeGPU are the execution times fo the CPU and GPU, respectively. The execution times of both the GPU and CPU code were observed when using single precision floating point numbers and when using double precision oating point numbers. The CPU had the best performance when using double precision, so all GPU speedups

are in comparison to the double precision CPU code.

Nanoscale Molecular Dynamics program:

NAMD (NAnoscale Molecular Dynamics program)is a freeware molecular dynamics simulation package written using the Charm++ parallel programming model, noted for its parallel efficiency and often used to simulate large systems (millions of atoms). It has been developed by the joint collaboration of the Theoretical and Computational Biophysics Group (TCB) and the Parallel Programming Laboratory (PPL) at the University of Illinois at Urbana-Champaign.

NAMD, recipient of a 2002 Gordon Bell Award and a 2012 Sidney Fernbach Award, is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 200,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR. NAMD is distributed free of charge with source code. You can build NAMD yourself or download binaries for a wide variety of platforms. Our tutorials show you how to use NAMD and VMD for biomolecular modeling.

References:

1.http://en.wikipedia.org/wiki/Molecular_modeling_on_GPUs

2.http://www.ks.uiuc.edu/Research/namd/