from / Ioana Banicescu<>
to / "Sushil K. Prasad" <>
cc / Ioana Banicescu <>
date / Thu, Apr 15, 2010 at 7:34 PM
subject / Edits to the "Core Topics..." document [Re: Invitation to TCPP excom/adcom meeting at IPDPS-10]
/ hide details7:34 PM (30 minutes ago)

Hi Sushil,
In my opinion the following additions (edits) marked *in red* are
necessary. See the attachment of this message. Feel free to
consider them or not. Here is an explanation:
Core Topics (Body of Knowledge):
------
- added "Optimality" since the students should be able to
theoretically evaluate their parallel algorithm (for instance
whether the algorithm is optimal in the "strong sense" or in
the "weak sense", when do we say that a parallel algorithm is optimal,
etc.
- added the 3 general algorithmic paradigms in addition to Divide and
Conquer: Greedy, Dynamic Programming, Branch and Bound and Backtracking
(sure the last ones are challenging for parallelization)
- "spelled out" the name of the programming paradigm for distributed
memory (with non-shared address space) as "message passing".
Minimum skill set for a CS/CE graduate:
------
- replaced "programming models" with "programming paradigms and
programming models"
I look forward to meeting with you at IPDPS.
Regards,
-- Ioana

TCPP/NSF Curriculum planning workshop on Curriculum Standards for Parallel and Distributed Computing
WashingtonDC, Feb 5-6, Hilton Arlington
Core Topics
(Body of Knowledge)
for Parallel and Distributed Computing for CS/CE Graduates
Chtchelkanova, Almadena (NSF), Das, Sajal (University of Texas at Arlington, NSF), Das, Chita (Penn State, NSF), Dehne, Frank (Carleton University, Canada), Gouda, Mohamed (University of Texas, Austin, NSF), Gupta, Anshul (IBM T.J. Watson Research Center), Jaja, Joseph (University of Maryland), Kant, Krishna (NSF, Intel), Lumsdaine, Andrew (Indiana University), Padua, David (University of Illinois at Urbana-Champaign), Parashar, Manish (Rutgers, NSF), Prasad, Sushil (Georgia State University), Prasanna, Viktor (University of Southern California), Robert, Yves (INRIA, France), Rosenberg, Arnold (Colorado State University), Sahni, Sartaj (University of Florida), Shirazi, Behrooz (Washington State University), Sussman, Alan (University of Maryland), and Wu, Jie (Temple University)
Contact: Sushil Prasad ()
3/29/2010
Abstract:This is a preliminary draft of core topics that a student graduating with a Bachelors degree in Computer Science or Computer Engineering is expected to have covered. Additional elective topics are expected. Also included here are the expected minimum skill set and a brief rationale for topics in algorithms. This is a working document, and its current limited release is to engage the various stakeholders for their initial feedback and update of this document (core vs. elective, additional topics, …?) to be taken up at the follow-up TCPP/NSF Curriculum MeetinBg on April 19th, 2010 in Atlanta (co-located with IPDPS-10).

Core Topics (Body of Knowledge)

Architectures

  • Classes
  • SIMD
  • MIMD
  • Shared memory (SMP, NUMA, CC-NUMA)
  • Distributed memory
  • Hybrid
  • Heterogeneous/Accelerator(e.g., GPU, FPGA)
  • Grid/cloud
  • Memory
  • Memory hierarchies
  • Caches
  • Cache coherence
  • Interconnects
  • Topologies (e.g., diameter vs. bandwidth)
  • Routing
  • Switches
  • Power
  • Power sinks
  • Methods to control power

Algorithms

  • Parallel and distributed models and complexity
  • Cost of computation: time, space, power, etc.; Asymptotics
  • Cost reduction: speedup, space compression, etc.
  • Cost tradeoffs: time vs. space, power vs. time, etc.
  • Scalability in algorithms and architectures
  • Optimality
  • Model-based notions:
  • Notions from complexity-theory: the PRAM, simulation/emulation, P-completeness, #P-completeness
  • Notions from scheduling: dependencies, task graphs, work, (make)span
  • Algorithmic Paradigms
  • Divide & conquer, Greedy, Dynamic Programming, Branch and Bound and Backtracking
  • Recursion
  • Scan (parallel prefix) and reduction (map-reduce)
  • Series-parallel composition
  • Stencil-based iteration
  • Graph embedding as an algorithmic tool
  • Algorithmic Problems
  • Communication: broadcast, multicast, scatter/gather, gossip
  • Sorting & selection
  • Leader election
  • Termination detection
  • Graph algorithms: search, path selection
  • Matrix computations: matrix product, linear systems, matrix arithmetic

Programming

  • Paradigms
  • Data parallel
  • Task parallel
  • Shared memory (threads, atomic operations, mutual exclusion, locks, barriers, race conditions)
  • Distributed memory – Message Passing
  • Hybrid
  • Parallel programming frameworks
  • Libraries/APIs(examples: Pthreads, MPI, etc.)
  • Languages (examples: Cilk, UPC, etc.)
  • Language extensions (example: OpenMP)
  • Hybrid frameworks (examples: CUDA, OpenCL)
  • Program development
  • Task decomposition: Locality, partitioning/tiling, over-decomposition
  • Load balancing
  • Scheduling
  • Mapping: Static, dynamic
  • Synchronization: critical sections, asynchronous computation
  • Communication: latency hiding, aggregation of messages, false sharing
  • Tools
  • Debuggers
  • Performance monitoring
  • Program analysis
  • Dependences: task graph, span vs. work
  • Synchronization analysis: races, deadlocks, livelocks

Minimum skill set for a CS/CE graduate

-An undergraduate student should be able to program parallel and distributed systems efficiently (productivity and performance)

  • Productivity - Languages, software engineering, programming methodologies, algorithms, parallel design patterns, tools, libraries
  • Performance - Execution time, power, memory, I/O, scalability, throughput

-To be aware of interaction among different tools, algorithms, architectures, programming paradigms andprogramming models

-The knowledge should be relevant for the foreseeable future

e.g., multi-core, GPUs, web-services, clusters
Rationale: Algorithms Topics

Parallel/Distributed Models and Complexity: It is essential to provide students with a firm background in the conceptualunderpinnings of parallel and distributed computing. Technology changesin this dynamic field at a hitherto unheard of pace. Students whose educationis tied too closely to current technology will be at a disadvantagewhen technology shifts render their knowledge obsolete. To mention just afew instances of such paradigm-changing shifts: (1) The VLSI revolution ofthe late 1970s and early 1980s enabled computers with unheard-of levels ofparallelism. New problems related to interprocesssor communication arose,exposing an aspect of parallel computing that could safely be ignored whenlevels of parallelism were very modest. (2) As clusters of modest processors(or, workstations) joined multiprocessors-in-a-box as parallel computingplatforms in the early 1990s, two sea-changes occurred: (a) computing platformsbecame heterogeneous for the first time, and (b) communication delaysbecame impossible to hide via clever latency-hiding strategies. (3) As computationalgrids became “parallel” computing platforms around the turn ofthe millennium, the distinction between parallel and distributed computingbecame fuzzier than ever before.

Fortunately, in spite of the “leaps and bounds” evolution of parallel computingtechnology, there exists a core of fundamental algorithmic principles.These principles are largely independent from the details of the underlyingplatform architecture and provide the basis for developing applications oncurrent and future parallel platforms. Students should be taught how toidentify and synthesize fundamental ideas and generally applicable algorithmicprinciples out of the mass of parallel algorithm expertise and practicalimplementations developed over the last decades.

What, however, should be taught under the rubric “conceptual underpinningsof parallel and distributed computing?” Our choices reflect a combinationof “persistent truths” and “conceptual flexibility.” In the formercamp, one strives to impart: (1) an understanding of how one reasons rigorouslyabout the expenditure of computational resources; (2) an appreciationof fundamental computational limitations that transcend details of particularmodels. In the latter camp, one needs to expose the student to a varietyof “situation-specific” models, with the hope of endowing the student withthe ability to generate new models in response to new technology.

Algorithmic Paradigms: This section acknowledges the folk wisdom that contrasts giving a person afish and teaching a student how to fish. Algorithmic paradigms lie in thelatter camp. We have attempted to enumerate a variety of paradigms whosealgorithmic utility have been demonstrated over the course of decades. Insome sense, one can view these paradigms as algorithmic analogues of high-levelprogramming languages. The paradigms in our list can be viewed asgenerating control structures that are readily ported onto a broad range ofparallel and distributed computing platforms. Included here are: the well-knowndivide-and-conquer paradigm that is as useful in the world of sequentialcomputing as in the world of parallel and distributed computing; theseries-parallel paradigm that one encounters, e.g., in multi-threaded computingenvironments; stencil-based controllers of iterative processes; graph-embeddingtechniques that facilitate creating virtual computing platformsthat real platforms can then emulate.

Algorithmic problems: Two genres of specific algorithmic problems play such important roles insuch a variety of computational situations that we view them as essential toa education about parallel and distributed computing. Our list contains twogenres of problems. (1) Several of our entries are auxiliary computations thatare useful in a variety of settings. Collective communication primitives arefundamental in myriad applications to the distribution of data and the collectionof results. Certain basic functions play important control functions,especially as parallel computing incorporates many of the characteristics ofdistributed computing. The process of leader election endows processorswith computationally meaningful names that are useful in initiating andcoordinating activities at possibly remote sites. The essential process of terminationdetection is easy when parallelism is modest and processors arephysically proximate, but it is a significant challenge in more general environments.(2) Several of our entries are independent computations that areusually sub-procedures of a broad range of large-scale computations. Sortingand selection are always near the top of everyone’s list of basic computations.

Algorithms that operate on graphs and matrices can occur in almostevery application area.

1