<?xml version="1.0" encoding="UTF-8" ?>
<document>
  <record>
    <time_stamp>2004-07-14_21:01:50</time_stamp>
    <status>active</status>
    <sub_id>pap263</sub_id>
    <event_type>Paper</event_type>
    <sess_id>10</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>4:30PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Emerging Architectures</sess_title>
    <sess_chair>Jose L Munoz (NSF)</sess_chair>
    <title>Analysis and Performance Results of a Molecular Modeling Application on Merrimac</title>
    <all_auth_inst>Mattan Erez (Stanford Universty), Jung Ho Ahn (Stanford Universty), Ankit Garg (Stanford Universty), William J. Dally (Stanford Universty), Eric Darve (Stanford Universty)</all_auth_inst>
    <abs>The Merrimac supercomputer uses stream processors and a high-radix network to achieve high performance at low cost and low power. The stream architecture matches the capabilities of modern semiconductor technology with compute-intensive parallel applications. We present a detailed case study of porting the GROMACS molecular-dynamics force calculation to Merrimac. The characteristics of the architecture which stress locality, parallelism, and decoupling of memory operations and computation, allow for high performance of compiler optimized code. The rich set of hardware memory operations and the ample computation bandwidth of the Merrimac processor present a wide range of algorithmic trade-offs and optimizations which may be generalized to several scientific computing domains. We use a cycle-accurate hardware simulator to analyze the performance bottlenecks of the various implementations and to measure application run-time. A comparison with the highly optimized GROMACS code, tuned for an Intel Pentium 4, confirms Merrimac&amp;apos;s potential to deliver high performance.</abs>
    <awards>Best Student Paper Nominee</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap263.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_23:36:19</time_stamp>
    <status>active</status>
    <sub_id>pap178</sub_id>
    <event_type>Paper</event_type>
    <sess_id>49</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>2:00PM</begin_time>
    <end_time>2:30PM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>Compiler Technology</sess_title>
    <sess_chair>Mary Hall (USC/ISI)</sess_chair>
    <title>Rating Compiler Optimizations for Automatic Performance Tuning</title>
    <all_auth_inst>Zhelong Pan (Purdue University), Rudolf Eigenmann (Purdue University)</all_auth_inst>
    <abs>To achieve maximum performance gains through compiler optimization, most automatic performance tuning systems use a feed-back directed approach to rate the code versions generated under different optimization options and to search for the best one. They all face the problem that code versions are only comparable if they run under the same execution context. This paper proposes three accurate, fast and flexible rating approaches that address this problem. The three methods identify comparable execution contexts, model relationships between contexts, or force re-execution of the code under the same context, respectively. We apply these methods in an automatic offline tuning scenario. Our performance tuning system improves the program performance of a selection of SPEC CPU 2000 benchmarks by up to 178% (26% on average). Our techniques reduce program tuning time by up to 96% (80% on average), compared to the state-of-the-art tuning scenario that compares optimization techniques using whole-program execution.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap178.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_19:46:33</time_stamp>
    <status>active</status>
    <sub_id>pap267</sub_id>
    <event_type>Paper</event_type>
    <sess_id>48</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>11:00AM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>Performance Measurement &amp; Optimization</sess_title>
    <sess_chair>  </sess_chair>
    <title>Performance Tool Support for MPI-2 on Linux</title>
    <all_auth_inst>Kathryn Mohror (Portland State University), Karen L. Karavanic (Portland State University)</all_auth_inst>
    <abs>Programmers of message-passing codes for clusters of workstations face challenges in understanding performance bottlenecks of their applications.  This is largely due to the vast amount of performance data that is collected, and the time and expertise necessary to use traditional parallel performance tools to analyze that data.  This paper reports on our recent efforts developing a performance tool for MPI applications on Linux clusters.  Our target MPI implementations were LAM/MPI and MPICH2, both of which support portions of the MPI-2 Standard.  We started with an existing performance tool and added support for non-shared filesystems, MPI-2 one-sided communications, dynamic process creation, and MPI Object naming.  We present results using the enhanced version of the tool to examine the performance of several applications.  We describe a new performance tool benchmark suite we have developed, PPerfMark, and present results for the benchmark using the enhanced tool. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap267.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_02:22:19</time_stamp>
    <status>active</status>
    <sub_id>pap154</sub_id>
    <event_type>Paper</event_type>
    <sess_id>8</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>11:00AM</begin_time>
    <end_time>11:30AM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Distributed Data Management</sess_title>
    <sess_chair>Merritt E. Jones (The MITRE Corp.)</sess_chair>
    <title>Fastpath Optimizations for Cluster Recovery in Shared-Disk Systems</title>
    <all_auth_inst>Randal Burns (Johns Hopkins University)</all_auth_inst>
    <abs>We describe the design and implementation of a clustering service for a high-performance, shared-disk file system.    The service provides failure detection and recovery, reliable end-to-end messaging, and a centralized and recoverable management interface.  We implement novel optimizations in the voting protocol that resolves cluster membership.  Optimizations allow clusters to form as quickly as possible without introducing livelock or requiring timeout parameters to be tuned carefully.  Our treatment includes performance results that quantify the scalability of the system and measure recovery times.</abs>
    <awards>Best Paper Nominee</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap154.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>pap149</sub_id>
    <event_type>Paper</event_type>
    <sess_id>48</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>11:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>Performance Measurement &amp; Optimization</sess_title>
    <sess_chair>  </sess_chair>
    <title>Using Information from Prior Runs to Improve Automated Tuning Systems</title>
    <all_auth_inst>I-Hsin Chung (Department of Computer Science, University of Maryland, College Park), Jeffrey K. Hollingsworth (Department of Computer Science, University of Maryland, College Park)</all_auth_inst>
    <abs>Active Harmony is an automated runtime performance tuning system. In this paper we describe a parameter prioritizing tool to help focus on those parameters that are performance critical. Historical data is also utilized to further speed up the tuning process. We first verify our proposed approaches with synthetic data and finally we verify all the improvements on a real cluster-based web service system. Taken together, these changes allow that Active Harmony system to reduce the time spent tuning from 35% up to 50% and at the same time, reduce the variation in performance while tuning.</abs>
    <awards>Best Student Paper Nominee</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap149.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-17_22:55:01</time_stamp>
    <status>active</status>
    <sub_id>pap163</sub_id>
    <event_type>Paper</event_type>
    <sess_id>51</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>4:00PM</begin_time>
    <end_time>4:30PM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Processor &amp; Communication Performance</sess_title>
    <sess_chair>Steven Gregory Parker (University of Utah)</sess_chair>
    <title>Data Centric Cache Measurement on the Intel Itanium 2 Processor</title>
    <all_auth_inst>Bryan R. Buck (Platform Logic), Jeffrey K. Hollingsworth (The University of Maryland, College Park)</all_auth_inst>
    <abs>Processor speed continues to increase faster than the speed of access to main memory, making effective use of memory caches more important. Information about an application’s interaction with the cache is therefore critical to performance tuning. To be most useful, tools that measure this information should relate it to the source code level data structures in an application. We describe how to gather such information by using hardware performance counters to sample cache miss addresses, and present a new tool named Cache Scope that does this using the Intel Itanium 2 performance monitors. We present experimental results concerning Cache Scope’s accuracy and perturbation of cache behavior. We also describe a case study of using Cache Scope to tune two applications, achieving 24% and 19% reductions in running time.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap163.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_12:57:14</time_stamp>
    <status>active</status>
    <sub_id>pap230</sub_id>
    <event_type>Paper</event_type>
    <sess_id>7</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>11:00AM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>File Systems</sess_title>
    <sess_chair>John L Cole (US Army Research Laboratory)</sess_chair>
    <title>Kosha: A Peer-to-Peer Enhancement for the Network File System</title>
    <all_auth_inst>Ali Raza Butt (Purdue University), Troy A. Johnson (Purdue University), Yili Zheng (Purdue University), Y. Charlie Hu (Purdue University)</all_auth_inst>
    <abs>This paper presents Kosha, a peer-to-peer (p2p) enhancement for the widely-used Network File System (NFS). Kosha harvests redundant storage space on cluster nodes and user desktops to provide a reliable, shared file system that acts as a large storage with normal NFS semantics. P2p storage systems provide location transparency, mobility transparency, load balancing, and file replication; features that are not available in NFS. On the other hand, NFS provides hierarchical file organization, directory listings, and file permissions, which are missing from p2p storage systems. By blending the strengths of NFS and p2p storage systems, Kosha provides a low-overhead storage solution. Our experiments show that compared to unmodified NFS, Kosha introduces a 4.1% fixed overhead and 1.5% additional overhead as nodes are increased from one to eight. For larger number of nodes, the additional overhead increases slowly. Kosha achieves load balancing in distributed directories, and  guarantees 99.99% or better file availability.  </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap230.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_20:21:50</time_stamp>
    <status>active</status>
    <sub_id>pap204</sub_id>
    <event_type>Paper</event_type>
    <sess_id>11</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>2:00PM</begin_time>
    <end_time>2:30PM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Architectural Paradigms</sess_title>
    <sess_chair>William Carlson (IDA Center for Computing Sciences)</sess_chair>
    <title>Early Experience with Aerospace CFD at JAXA on the Fujitsu PRIMEPOWER HPC2500</title>
    <all_auth_inst>Yuichi Matsuo (Japan Aerospace Exploration Agency), Masako Tsuchiya (Japan Aerospace Exploaration Agency), Masaki Aoki (Fujitsu Limited), Naoki Sueyasu (Fujitsu Limited), Tomohide Inari (Fujitsu Limited), Katsumi Yazawa (Fujitsu Limited)</all_auth_inst>
    <abs>Japan Aerospace Exploration Agency has introduced a new terascale SMP-cluster-type parallel supercomputer system consisting of Fujitsu PRIMEPOWER HPC2500 as the main compute engine of Numerical Simulator III for aerospace science and engineering purposes. It started its full operation at October 2002. The system has computing capability of 9.3Tflop/s peak performance and 3.6TB user memory, with about 1,800 scalar processors for computation. It has a mass storage consisting of 57TB disk and 620TB tape library, and a visualization system tightly integrated with the computation system. In this paper, after reviewing the history of the Numerical Simulator, we will describe the system configuration in detail accompanying with the requirement and design of the NS-III. Next, we will present the performance evaluation results of the Fujitsu PRIMEPOWER HPC2500 system both for micro- and kernel-benchmarks and aerospace CFD applications used at JAXA, and finally we discuss how we use the system effectively.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap204.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-14_17:03:12</time_stamp>
    <status>active</status>
    <sub_id>pap258</sub_id>
    <event_type>Paper</event_type>
    <sess_id>39</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>3:30PM</begin_time>
    <end_time>4:00PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Performance Evaluation Algorithms</sess_title>
    <sess_chair>Henri Casanova (UCSD)</sess_chair>
    <title>Predicting and Evaluating Distributed Communication Performance</title>
    <all_auth_inst>Kirk W Cameron (Univ of S Carolina), Rong Ge (Univ of S Carolina)</all_auth_inst>
    <abs>Application of hardware-parameterized models to distributed systems can result in omission of key bottlenecks such as the full cost of inter- and intra-node communication in a cluster of SMPs. However, inclusion of message and middleware characteristics may result in impractical models. Nonetheless, the growing gap between memory and CPU performance combined with the trend toward large scale clustered shared memory platforms implies an increased need to consider the impact of middleware on distributed communication. We present a software-parameterized model of point-to-point communication for use in performance prediction and evaluation. We illustrate the utility of the model in two ways: 1) to derive a simple, useful, more accurate model of point-to-point communication in clusters of SMPs, 2) to predict and analyze point-to-point and broadcast communication costs in clusters of SMPs. We present our results on an IA-64-based cluster.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap258.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_14:09:16</time_stamp>
    <status>active</status>
    <sub_id>pap228</sub_id>
    <event_type>Paper</event_type>
    <sess_id>55</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>11:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Terascale Networking</sess_title>
    <sess_chair>Philip Michael  Papadopoulos (San Diego Supercomputer Center/UC San Diego)</sess_chair>
    <title>Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation</title>
    <all_auth_inst>Jiuxing Liu (The Ohio State University), Abhinav Vishnu (The Ohio State University), Dhabaleswar K. Panda (The Ohio State University)</all_auth_inst>
    <abs>In this paper, we study how to overcome bandwidth bottleneck by using multirail InfiniBand networks. We present different multirail configurations and propose a unified MPI design to support them. We also discuss various design issues and provide in-depth discussions of different policies(even and weighted) of using multirail networks, including an adaptive striping scheme that can dynamically adapt to current system conditions.

We implement our design and evaluated it using both microbenchmarks and applications. Our results show that multirail networks can significantly improve MPI communication performance. With a two rail InfiniBand cluster, we can achieve almost twice the bandwidth and half the latency for large messages compared with the original MPI. At the application level, the multirail MPI can significantly reduce communication time as well as running time.  We have also shown that the adaptive striping scheme can achieve excellent performance without apriori knowledge of the bandwidth of each rail. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap228.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>pap106</sub_id>
    <event_type>Paper</event_type>
    <sess_id>12</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>11:00AM</begin_time>
    <end_time>11:30AM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Advanced Hardware Features</sess_title>
    <sess_chair>Thomas Wingfield Page (NSA)</sess_chair>
    <title>Using Hardware Counters to Automatically Improve Memory Performance</title>
    <all_auth_inst>Mustafa M Tikir (University of Maryland), Jeffrey K Hollingsworth (University of Maryland)</all_auth_inst>
    <abs>In this paper, we introduce a profile-driven online page migration scheme and investigate its impact on the performance of multithreaded applications. We use lightweight, inexpensive plug-in hardware counters to profile the memory access behavior of an application, and then migrate pages to memory local to the most frequently accessing processor. Using the Dyninst runtime instrumentation combined with hardware counters, we were able to add page migration capabilities to the system without having to modify the operating system kernel, or to re-compile application programs. This approach reduced the total number of non-local memory accesses of applications by up to 90%. Even on a system with small remote to local memory access latency rations, this resulted in up to 16% improvement in execution time. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap106.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-14_21:02:47</time_stamp>
    <status>active</status>
    <sub_id>pap270</sub_id>
    <event_type>Paper</event_type>
    <sess_id>50</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>2:00PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Fault Tolerance</sess_title>
    <sess_chair>Alan Sussman (University of Maryland)</sess_chair>
    <title>Assessing Fault Sensitivity in MPI Applications</title>
    <all_auth_inst>CHARNG-DA LU (Department of Computer Science, University of Illinois at Urbana-Champaign), DANIEL A. REED (Renaissance Computing Institute, University of North Carolina at Chapel Hill)</all_auth_inst>
    <abs>Today, clusters built from commodity PCs dominate high-performance computing, with systems containing thousands of processors now being deployed. As node counts for multi-teraflop systems grow to thousands and with proposed petaflop system likely to contain tens of thousands of nodes, the standard assumption that system hardware and software are fully reliable becomes much less credible.  Concomitantly, understanding application sensitivity to system failures is critical to establishing confidence in the outputs of large-scale applications.

Using software fault injection, we simulated single bit memory errors, register file upsets and MPI message payload corruption and measured the behavioral responses for a suite of MPI applications. These experiments showed that most applications are very sensitive to even single errors. Perhaps most worrisome, the errors were often undetected, yielding erroneous output with no user indicators. Encouragingly, even minimal internal application error checking and program assertions can detect many of the faults we injected.</abs>
    <awards>Best Paper Nominee</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap270.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-18_23:57:42</time_stamp>
    <status>active</status>
    <sub_id>pap189</sub_id>
    <event_type>Paper</event_type>
    <sess_id>12</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>11:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Advanced Hardware Features</sess_title>
    <sess_chair>Thomas Wingfield Page (NSA)</sess_chair>
    <title>GPU Cluster for High Performance Computing</title>
    <all_auth_inst>Zhe Fan (Stony Brook University), Feng Qiu (Stony Brook University), Arie Kaufman (Stony Brook University), Suzanne Yoakum-Stover (Stony Brook University)</all_auth_inst>
    <abs>Inspired by the attractive Flops/dollar ratio and the incredible growth in the speed of modern graphics processing units (GPUs), we propose to use a cluster of GPUs for high performance scientific computing. As an example application, we have developed a parallel flow simulation using the lattice Boltzmann model (LBM) on a GPU cluster and have simulated the dispersion of airborne contaminants in the Times Square area of New York City. Using 30 GPU nodes, our simulation can compute the 480x400x80 LBM in 0.31 second/step, a speed which is 4.6 times faster than that of our CPU cluster implementation. Besides the LBM, we also discuss other potential applications of the GPU cluster, such as cellular automata, PDE solvers, and FEM. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap189.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_13:57:00</time_stamp>
    <status>active</status>
    <sub_id>pap256</sub_id>
    <event_type>Paper</event_type>
    <sess_id>8</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>11:00AM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Distributed Data Management</sess_title>
    <sess_chair>Merritt E. Jones (The MITRE Corp.)</sess_chair>
    <title>Dynamic Metadata Management for Petabyte-scale File Systems</title>
    <all_auth_inst>Sage A. Weil (UC Santa Cruz), Kristal T. Pollack (UC Santa Cruz), Scott A. Brandt (UC Santa Cruz), Ethan L. Miller (UC Santa Cruz)</all_auth_inst>
    <abs>In petabyte-scale distributed file systems that decouple read and write from metadata operations, behavior of the metadata server cluster will be critical to overall system performance and scalability.  We present a dynamic subtree partitioning and adaptive metadata management system designed to efficiently manage hierarchical metadata workloads that evolve over time.  We examine the relative merits of our approach in the context of traditional workload partitioning strategies, and demonstrate the performance, scalability and adaptability advantages in a simulation environment.

</abs>
    <awards>Best Student Paper Nominee</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap256.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_16:18:40</time_stamp>
    <status>active</status>
    <sub_id>pap125</sub_id>
    <event_type>Paper</event_type>
    <sess_id>37</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>2:00PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Scheduling Algorithms</sess_title>
    <sess_chair>Yves Robert (ENS Lyon)</sess_chair>
    <title>Coscheduling in Clusters: Is It a Viable Alternative?</title>
    <all_auth_inst>Gyu Sang Choi (Penn State Univ), Jin-Ha Kim (Penn State Univ), Deniz Ersoz (Penn State Univ), Andy B. Yoo (Lawrence Livermore National  Laboratory), Chita R. Das (Penn State Univ)</all_auth_inst>
    <abs>In this paper, we conduct an in-depth evaluation of a broad spectrum of scheduling alternatives for clusters. These include the widely used batch scheduling, local scheduling, gang scheduling, all prior communication-driven coscheduling algorithms (Dynamic Coscheduling (DCS), Spin Block (SB), Periodic Boost(PB), and  Co-ordinated Coscheduling} (CC)) and a newly proposed  HYBRID coscheduling algorithm on a 16-node, Myrinet-connected Linux cluster.

Performance and energy measurements using several NAS, LLNL and ANL benchmarks on the Linux cluster provide several interesting conclusions. First, blocking-based coscheduling techniques such as SB, CC and HYBRID and the gang scheduling can provide much better performance even in a dedicated  cluster platform.  Second, we observe that blocking-based schemes like SB and HYBRID can provide better performance than spin-based techniques like PB on a Linux platform. Third, the proposed HYBRID scheduling provides the best performance-energy behavior and can be implemented on any cluster with little effort. </abs>
    <awards>Best Student Paper Nominee</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap125.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>pap247</sub_id>
    <event_type>Paper</event_type>
    <sess_id>11</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>2:00PM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Architectural Paradigms</sess_title>
    <sess_chair>William Carlson (IDA Center for Computing Sciences)</sess_chair>
    <title>Scientific Computations on Modern Parallel Vector Systems</title>
    <all_auth_inst>Leonid Oliker (Lawrence Berkeley National Laboratory), Andrew Canning (Lawrence Berkeley National Laboratory), Jonathan Carter (Lawrence Berkeley National Laboratory), John Shalf (Lawrence Berkeley National Laboratory), Stephane Ethier (Princeton Plasma Physics Laboratory)</all_auth_inst>
    <abs>Recently, two innovative  parallel-vector architectures have become available to the supercomputing community:  the Japanese Earth Simulator (ES) and the Cray X1. In order to quantify what these modern vector capabilities entail for the scientists that rely on modeling and simulation, it is critical to evaluate this architectural paradigm in the context of demanding computational algorithms. Our evaluation study examines four diverse scientific applications  with the potential to run at ultrascale,  from the areas of plasma physics,  material science, astrophysics, and magnetic fusion.  We compare performance between the vector-based ES and X1, with leading superscalar-based platforms: the IBM Power3/4 and the SGI Altix. Our research team was the first international group to conduct a performance evaluation study at the Earth Simulator Center; remote ES access  in not available. Results demonstrate that the vector systems achieve excellent performance on our application suite - the highest of any architecture tested to date.</abs>
    <awards>Best Paper Nominee</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap247.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>pap266</sub_id>
    <event_type>Paper</event_type>
    <sess_id>50</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>2:00PM</begin_time>
    <end_time>2:30PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Fault Tolerance</sess_title>
    <sess_chair>Alan Sussman (University of Maryland)</sess_chair>
    <title>Implementation and Evaluation of a Scalable Application-level Checkpoint-Recovery Scheme for MPI Programs</title>
    <all_auth_inst>Martin Schulz (School of Electrical and Computer Engineering, Cornell University), Greg Bronevetsky (Department of Computer Science, Cornell University), Rohit Fernandes (Department of Computer Science, Cornell University), Daniel Marques (Department of Computer Science, Cornell University), Keshav Pingali (Department of Computer Science, Cornell University), Paul Stodghill (Department of Computer Science, Cornell University)</all_auth_inst>
    <abs>Checkpoint-and-restart (CPR) is the most commonly used scheme for recovering from hardware failures. Most automatic CPR schemes in the literature can be classified as blocking, system-level checkpointing schemes because they take core-dump style snapshots of the computational state when all the processes are blocked at global barriers in the program. Unfortunately, such systems are neither general nor portable.

We are exploring an alternative approach called non-blocking application-level checkpointing. In our approach, programs are transformed by a pre-processor so that they become self-checkpointing and self-restartable on any platform. In this paper, we describe our implementation of non-blocking application-level checkpointing. We present experimental results on both a Windows cluster and the Lemieux system at the Pittsburgh Supercomputer Center, and argue that these results demonstrate both the platform-independence and the scalability of our approach.

</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap266.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>pap197</sub_id>
    <event_type>Paper</event_type>
    <sess_id>37</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>2:30PM</begin_time>
    <end_time>3:00PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Scheduling Algorithms</sess_title>
    <sess_chair>Yves Robert (ENS Lyon)</sess_chair>
    <title>A Geometric Programming Framework for Optimal Multi-level Tiling</title>
    <all_auth_inst>Lakshminarayanan Renganarayana (Computer Science Department, Colorado State University), Sanjay Rajopadhye (Computer Science Department, Colorado State University)</all_auth_inst>
    <abs>Determining the optimal tile size - one that minimizes the execution time subject to memory characteristics - is a classical problem in compilation of loops. Designing a model of the overall execution time of tiled loop nests is an important subproblem.  We present a framework for determining optimal tile sizes for a fully permutable, perfectly nested, rectangular loop with uniform dependences. Our framework supports multiple levels of tiling and uses a BSP style high level model for estimating the overall execution time of a loop program. In our framework, the problem of determining the optimal tile sizes subject to memory hierarchy (viz., capacity and bandwidth) constraints, is modeled as a geometric program and transformed into a convex optimization problem, which can be solved efficiently. Our execution time model is validated through experimental results obtained by running twenty loop programs for different levels of tiling and different program and tile parameters.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap197.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_11:32:00</time_stamp>
    <status>active</status>
    <sub_id>pap182</sub_id>
    <event_type>Paper</event_type>
    <sess_id>48</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>11:00AM</begin_time>
    <end_time>11:30AM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>Performance Measurement &amp; Optimization</sess_title>
    <sess_chair>  </sess_chair>
    <title>Language and Compiler Support for Adaptive Applications</title>
    <all_auth_inst>Wei Du (Ohio State University), Gagan Agrawal (Ohio State University)</all_auth_inst>
    <abs>There exist many application classes for which the users have  significant flexibility in the quality of the  output   they desire. At the same time, there are other constraints, such as the need for real-time response or limit on the  consumption of certain resources,  which are  more crucial. This paper  provides a combined language/compiler  and runtime solution for  supporting {\em adaptive} execution of these applications. The key idea in  our language  extensions is to have the programmers specify {\em adaptation parameters}, i.e, the parameters whose values can be varied within a certain range. A  program analysis   algorithm   states the execution time of an application component as a function of  the values of the adaptation parameters and  other runtime constants. These constants are determined by initial runs of the application in  the target environment.  We integrate this work with our previous work on supporting {\em coarse-grained pipelined} parallelism. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap182.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-16_13:21:17</time_stamp>
    <status>active</status>
    <sub_id>pap183</sub_id>
    <event_type>Paper</event_type>
    <sess_id>51</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>4:30PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Processor &amp; Communication Performance</sess_title>
    <sess_chair>Steven Gregory Parker (University of Utah)</sess_chair>
    <title>Runtime Compression of MPI Messages to Improve the Performance and Scalability of Parallel Applications</title>
    <all_auth_inst>Jian Ke (Cornell University), Martin Burtscher (Cornell University), Evan Speight (IBM Austin Research Lab)</all_auth_inst>
    <abs>Communication-intensive parallel applications spend a significant amount of their total execution time exchanging data between processes, which leads to poor performance in many cases.  In this paper, we investigate message compression in the context of large-scale parallel message-passing systems to reduce the communication time of individual messages and to improve the bandwidth of the overall system.  We implement and evaluate the cMPI message-passing library, which quickly compresses messages on-the-fly with a low enough overhead that a net execution time reduction is obtained.  Our results on six large-scale benchmark applications show that their execution speed improves by up to 98% when message compression is enabled. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap183.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-16_13:24:29</time_stamp>
    <status>active</status>
    <sub_id>pap181</sub_id>
    <event_type>Paper</event_type>
    <sess_id>49</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>2:30PM</begin_time>
    <end_time>3:00PM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>Compiler Technology</sess_title>
    <sess_chair>Mary Hall (USC/ISI)</sess_chair>
    <title>Toward a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors</title>
    <all_auth_inst>David Parello (ALCHEMY Group, HP France), Olivier Temam (ALCHEMY Group, INRIA Futurs and LRI, Paris Sud University), Albert Cohen (ALCHEMY Group, INRIA Futurs), Jean-Marie Verdun (HP France)</all_auth_inst>
    <abs>  Because processor architectures are increasingly complex, it is   increasingly difficult to embed accurate machine models within   compilers, and compiler efficiency tends to   decrease. Currently, the trend is on top-down approaches: static   compilers are augmented with information from the   architecture as in profile-based, iterative or dynamic compilation   techniques.  However, currently, fairly elementary   architectural information is used. In this article, we adopt a   bottom-up approach to the architecture complexity issue: we assume   we know everything about the behavior of the program on the   architecture. We present a manual but systematic process for   optimizing a program on a complex processor architecture using   extensive dynamic analysis, and we identify a small set of run-time   information sufficient to drive an efficient process. We have   experimentally observed on an Alpha 21264 that this approach can   yield significant performance improvement on Spec benchmarks, beyond   peak Spec. We are currently using this approach for optimizing   customer applications.</abs>
    <awards>Best Student Paper Nominee</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap181.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>pap199</sub_id>
    <event_type>Paper</event_type>
    <sess_id>8</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>11:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Distributed Data Management</sess_title>
    <sess_chair>Merritt E. Jones (The MITRE Corp.)</sess_chair>
    <title>Optimal File-Bundle Caching Algorithms for Data-Grids</title>
    <all_auth_inst>Ekow J. Otoo (LBNL), Doron Rotem (LBNL), Alexandru Romosan (LBNL)</all_auth_inst>
    <abs>The file-bundle caching problem arises frequently in scientific applications  where jobs process several files concurrently. Consider a host system in a  data-grid that maintains a disk cache for servicing jobs of file requests  where a job is serviced only if all its requested files are present in the  disk cache. Files must now be admitted into the cache and replaced in sets of  file-bundles. We show that traditional caching algorithms based on file  popularity measures do not perform well since they may hold in cache  on-relevant combinations of files. We present and analyze a new caching algorithm for maximizing the throughput of jobs and minimizing data  replacement costs at such data-grid hosts. We tested the new algorithm using  a disk cache simulation model under a wide range of conditions of file request  distributions, varying cache size, file size distribution, etc. The results  show significant improvement over traditional caching algorithms.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap199.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_07:51:11</time_stamp>
    <status>active</status>
    <sub_id>pap110</sub_id>
    <event_type>Paper</event_type>
    <sess_id>50</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>2:30PM</begin_time>
    <end_time>3:00PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Fault Tolerance</sess_title>
    <sess_chair>Alan Sussman (University of Maryland)</sess_chair>
    <title>RPC-V: Toward Fault-Tolerant RPC for Grids with Volatile Nodes</title>
    <all_auth_inst>Samir Djilali (LRI), Thomas Herault (LRI), Franck Cappello (INRIA), Oleg Lodygensky (LAL), Tangui Morlier (INRIA), Gilles Fedak (INRIA)</all_auth_inst>
    <abs> RPC is one of the programming models envisioned for the Grid. In Internet connected Large Scale Grids such as Desktop Grids, nodes and  networks failures are not rare events. This paper provides several  contributions, examining the feasibility and limits of fault-tolerant RPC  on these platforms. 

First, we characterize these Grids from their fundamental features  and demonstrate that their applications scope should be safely restricted  to stateless services. 

Second, we present a new fault-tolerant RPC protocol associating an original combination  of three-tier architecture, passive replication and message logging. We describe RPC-V,  an implementation of the proposed protocol within the XtremWeb Desktop Grid middleware. 

Third, we evaluate the performance of RPC-V and the impact of  faults on the execution time, using a real life application on a Desktop  Grid testbed assembling nodes in France and USA. We demonstrate that  RPC-V allows the applications to continue their execution while key system  components fail.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap110.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_18:54:32</time_stamp>
    <status>active</status>
    <sub_id>pap220</sub_id>
    <event_type>Paper</event_type>
    <sess_id>51</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>3:30PM</begin_time>
    <end_time>4:00PM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Processor &amp; Communication Performance</sess_title>
    <sess_chair>Steven Gregory Parker (University of Utah)</sess_chair>
    <title>Unlocking the Performance of the BlueGene/L Supercomputer</title>
    <all_auth_inst>George Almasi (IBM Thomas J. Watson Research Center), Leonardo Bachega (LARC - University of Sao Paulo), Sid Chatterjee (IBM Thomas J. Watson Research Center), Alan Gara (IBM Thomas J. Watson Research Center), John Gunnels (IBM Thomas J. Watson Research Center), Manish Gupta (IBM Thomas J. Watson Research Center), Amy Henning (IBM Thomas J. Watson Research Center), Jose Moreira (IBM Thomas J. Watson Research Center), Bob Walkup (IBM Thomas J. Watson Research Center), Alessandro Curioni (IBM Zurich Research Laboratory), Charles Archer (IBM Systems and Technology Group), Bor Chan (Lawrence Livermore National Laboratory), Bruce Curtis (Lawrence Livermore National Laboratory), Sharon Brunett (Caltech), Giri Chukkapalli (San Diego Supercomputer Center), Robert Robert (San Diego Supercomputer Center), Wayne Pfeiffer (San Diego Supercomputer Center)</all_auth_inst>
    <abs>The BlueGene/L supercomputer is expected to deliver new levels of application performance by providing a combination of good single-node computational performance and high scalability.  To achieve good single-node performance, the BlueGene/L design includes a special dual floating-point unit on each processor, and the ability to use two processors per node.  BlueGene/L also includes both a torus and a tree network to achieve high scalability. We demonstrate how benchmarks and applications can take advantage of these architectural features to get the most out of BlueGene/L.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap220.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_20:15:45</time_stamp>
    <status>active</status>
    <sub_id>pap195</sub_id>
    <event_type>Paper</event_type>
    <sess_id>4</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>4:00PM</begin_time>
    <end_time>4:30PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Applications II</sess_title>
    <sess_chair></sess_chair>
    <title>Big Wins with Small Application-Aware Caches</title>
    <all_auth_inst>Julio C. Lopez (Carnegie Mellon University), Tiankai Tu (Carnegie Mellon University), David O'Hallaron (Carnegie Mellon University)</all_auth_inst>
    <abs>Large datasets, on the order of GB and TB, are increasingly common as abundant computational resources allow practitioners to collect, produce and store data at higher rates.  As dataset sizes grow, it becomes more challenging to interactively manipulate and analyze these datasets due to the large amounts of data that need to be moved and processed.

Application-independent caches, such as operating system page caches and database buffer caches, are present throughout the memory hierarchy to reduce data access times and alleviate transfer overheads.  We claim that an application-aware cache with relatively modest memory requirements can effectively exploit dataset structure and application information to speed access to large datasets.  We demonstrate this idea in the context of a system named the tree cache, to reduce query latency to large octree datasets by an order of magnitude.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap195.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_04:59:12</time_stamp>
    <status>active</status>
    <sub_id>pap275</sub_id>
    <event_type>Paper</event_type>
    <sess_id>13</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>2:30PM</begin_time>
    <end_time>3:00PM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Extreme Performance</sess_title>
    <sess_chair></sess_chair>
    <title>Optimal blade system design of a new concept VTOL vehicle using the Departmental Computing Grid system</title>
    <all_auth_inst>Jin Woo Park (post doctoral researcher), Si Hyoung Park (post doctoral researcher), In Seong Hwang (graduate research assistant), Ji Joong Moon (graduate research assistant), Youngha Yoon (graduate research assistant), Seung Jo Kim (Professor, School of Aerospace and Mechanical Engrg, Seoul National University)</all_auth_inst>
    <abs>The blade system of a new concept VTOL vehicle is designed utilizing high performance and Grid computing technologies. The VTOL vehicle called cyclocopter employs a cycloidal propulsion system to generate the propulsion and lift for VTOL maneuver. The structural design and weight minimization of the composite blade system are critically related to the efficiency of whole cyclocopter system. The structural design is carried out using a hybrid genetic algorithm-based optimization framework on the Departmental Computing Grid(DCG) system, an aggregation of cluster resources installed in the Aerospace department of Seoul National University. High-fidelity simulation is conducted using our parallel finite element code (IPSAP) which employs the domain-wise parallel multifrontal solver, a direct solution method characterized by the solution robustness and the predictability of running time. The optimization results and computational aspects are displayed emphasizing the potential of the high performance Grid computing technology utilized for high-fidelity simulation based aerospace system design.</abs>
    <awards>Gordon Bell Finalist</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap275.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>pap146</sub_id>
    <event_type>Paper</event_type>
    <sess_id>3</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>11:00AM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Applications I</sess_title>
    <sess_chair></sess_chair>
    <title>Automatic Distribution of Rendering Workloads in a Grid Enabled Collaborative Visualization Environment</title>
    <all_auth_inst>Ian J Grimstead (School of Computer Science, Cardiff University), Nick J Avis (School of Computer Science, Cardiff University), David W Walker (School of Computer Science, Cardiff University)</all_auth_inst>
    <abs>This paper presents a distributed, collaborative grid enabled visualization environment that supports automated resource discovery across heterogeneous machines. Our Resource-Aware Visualization Environment (RAVE) runs as a background process using Grid/Web services, enabling us to share resources with other users rather than commandeering an entire machine. RAVE supports a wide range of machines, from hand-held PDAs to high-end servers with large-scale stereo, tracked displays. The local display device may render all, some or none of the data set remotely, depending on its available resources. This enables scientists and engineers to collaborate from their desks, in the field or in front of specialised immersive displays. We present initial results of our implementation, showing how we distribute complete datasets across multiple machines as required, using a central data service to distribute data updates from collaborating users. We will demonstrate RAVE at SC2004, utilising available heterogeneous resources.</abs>
    <awards>Best Paper Nominee</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap146.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-18_00:34:36</time_stamp>
    <status>active</status>
    <sub_id>pap160</sub_id>
    <event_type>Paper</event_type>
    <sess_id>5</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>11:00AM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Applications III</sess_title>
    <sess_chair></sess_chair>
    <title>A Computational Database System for Generating Unstructured Hexahedral Meshes with Billions of Elements</title>
    <all_auth_inst>Tiankai Tu (Carnegie Mellon University), David R. O'Hallaron (Carnegie Mellon University)</all_auth_inst>
    <abs>For a large class of physical simulations with relatively simple geometries, unstructured octree-based hexahedral meshes provide a good compromise between adaptivity and simplicity. However, generating unstructured hexahedral meshes with over 1 billion elements remains a challenging task. We propose a database approach to solve this problem. Instead of merely storing generated meshes into conventional databases, we have developed a new kind of software system called  Computational Database System (CDS) to generate meshes directly on databases. Our basic idea is to extend existing database techniques to organize and index mesh data, and use database-aware algorithms to manipulate database structures and generate meshes. This paper presents the design, implementation, and evaluation of a prototype CDS named Weaver, which has been used successfully by the CMU Quake project to generate queryable high-resolution finite element meshes for earthquake simulations with up to 1.22B elements and 1.37B nodes.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap160.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_13:34:52</time_stamp>
    <status>active</status>
    <sub_id>pap126</sub_id>
    <event_type>Paper</event_type>
    <sess_id>6</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>11:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Applications IV</sess_title>
    <sess_chair></sess_chair>
    <title>VIRACOCHA: An Efficient Parallelization Framework for Large-Scale CFD Post-Processing in Virtual Environments</title>
    <all_auth_inst>Andreas Gerndt (Center for Computing and Communication, RWTH Aachen University), Bernd Hentschel (Center for Computing and Communication, RWTH Aachen University), Marc Wolter (Center for Computing and Communication, RWTH Aachen University), Torsten Kuhlen (Center for Computing and Communication, RWTH Aachen University), Christian Bischof (Center for Computing and Communication, RWTH Aachen University)</all_auth_inst>
    <abs>One recommended strategy for the analysis of CFD-data is the interactive exploration within virtual environments. Common visualization systems are unable to process large data sets while carrying out real-time interaction and visualization at the same time. The obvious idea is to decouple flow feature extraction from visualization. This paper covers the functionality of the parallel CFD post-processing toolkit Viracocha. Two aspects are discussed in more detail. The first approach covers strategies to reduce the loading time. Data caching and prefetching are employed to reduce access time. The second aspect concerns an approach, called streaming, that minimizes the time a user has to wait for first results. Viracocha already sends coarse intermediate data back to the virtual environment before the final result is available. Different streaming and data handling strategies are described. In order to emphasize the benefit of our implementation efforts, some strategies are applied to multi-block CFD data sets.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap126.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>pap285</sub_id>
    <event_type>Paper</event_type>
    <sess_id>3</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>11:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Applications I</sess_title>
    <sess_chair></sess_chair>
    <title>Modernizing Existing Software: A Case Study</title>
    <all_auth_inst>Kees Everaars (cwi), Farhad Arbab (cwi), Barry Koren (cwi)</all_auth_inst>
    <abs>In this paper, we discuss one of our experiments using the coordination language Manifold to restructure an existing sequential numerical application into a concurrent application.  The application was written in ANSI C and deals with a sparse-grid method for a transport problem. Our approach is simple and is in fact a cut-and-paste method. First, we try to identify and isolate components in the legacy source code (the cut). Second, we glue them together by writing coordinator modules (glue modules) with the help of a coordination language (the paste). We also give some performance results. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap285.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>pap284</sub_id>
    <event_type>Paper</event_type>
    <sess_id>6</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>11:00AM</begin_time>
    <end_time>11:30AM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Applications IV</sess_title>
    <sess_chair></sess_chair>
    <title>A Parallel Visualization Pipeline for Terascale Earthquake Simulations</title>
    <all_auth_inst>Hongfeng Yu (Unversity of California, Davis), Kwan-Liu Ma (University of California, Davis), Joel Welling (Pittsburgh Supercomputing Center)</all_auth_inst>
    <abs>This paper presents a parallel visualization pipeline  for the  largest earthquake simulation ever performed. The simulation employs up to 100 million hexahedral cells to model the 3D seismic-wave propagation of the 1994 Northridge earthquake.  The time-varying dataset produced by the simulation requires terabytes of storage space. Our solution for visualizing such terascale simulations is based on a parallel cell-projection rendering algorithm coupled with a new parallel I/O strategy which effectively lowers interframe delay by utilizing a set of I/O processors. In addition, a 2D vector field visualization method and a 3D enhancement technique are incorporated into the parallel visualization framework to help scientists better understand the wave propagation. Our test results on an HP/Compaq  AlphaServer show we can effectively remove the I/O bottlenecks commonly  present in time-varying data  visualization. This high-performance visualization solution allows scientists to explore their data in the temporal, spatial,  and variable domains at high resolution. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap284.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_13:30:12</time_stamp>
    <status>active</status>
    <sub_id>pap166</sub_id>
    <event_type>Paper</event_type>
    <sess_id>3</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>11:00AM</begin_time>
    <end_time>11:30AM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Applications I</sess_title>
    <sess_chair></sess_chair>
    <title>A High Performance Java Middleware with a Real Application</title>
    <all_auth_inst>Fabrice Huet (Vrije Universiteit), Denis Caromel (INRIA-I3S-CNRS), Henri E. Bal (Vrije Universiteit)</all_auth_inst>
    <abs>Previous experiments with high-performance Java were initially  disappointing. After several years of optimization, this paper  investigates the current suitability  of such object-oriented  middleware for High-Performance and Grid programming.

Using a middleware offering high level abstractions (ProActive), we have replaced the standard Java RMI layer with the optimized Ibis RMI interface. Ibis is a grid programming environment featuring  efficient communications. Using a 3D electromagnetic application (an object-oriented time domain finite volume solver for 3D Maxwell equations) we have first conducted benchmarks on single clusters, including comparisons with the same application in  Fortran MPI. Finally, Grid experiments have been conducted simultaneously on up to 5 different clusters. 

Overall, the paper reports extremely promising results. For instance, a speed up of 12 on 16 machines (vs. 13.8 for Fortran), a speedup of 100 on 150 machines on a Grid.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap166.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>pap167</sub_id>
    <event_type>Paper</event_type>
    <sess_id>6</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>11:00AM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Applications IV</sess_title>
    <sess_chair></sess_chair>
    <title>A Parallel Implementation of 4-Dimensional Haralick Texture Analysis for Disk-resident Image Datasets</title>
    <all_auth_inst>Brent Woods (Dept. of Electrical Engineering, Ohio State University), Bradley Clymer (Dept. of Electrical Engineering, Ohio State University), Joel Saltz (Biomedical Informatics Department, Ohio State University), Tahsin Kurc (Biomedical Informatics Department, Ohio State University)</all_auth_inst>
    <abs>Texture analysis is one possible method to detect features in biomedical images. During texture analysis, texture related information is found by examining local variations in image brightness. 4-dimensional (4D) Haralick texture analysis is a method that extracts local variations along space and time dimensions and represents them as a collection of fourteen statistical parameters. However, the application of the 4D Haralick method on large time-dependent 2D and 3D image datasets is hindered by computation and memory requirements. This paper presents a parallel implementation of 4D Haralick texture analysis on PC clusters.  We present a performance evaluation of our implementation on a cluster of PCs. Our results show that good performance can be achieved for this application via combined use of task- and data-parallelism. </abs>
    <awards>Best Student Paper Nominee</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap167.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_11:41:10</time_stamp>
    <status>active</status>
    <sub_id>pap177</sub_id>
    <event_type>Paper</event_type>
    <sess_id>52</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>4:30PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>Grid Resource Management</sess_title>
    <sess_chair>Satoshi Matsuoka (Tokyo Institute of Technology)</sess_chair>
    <title>A Peer-to-Peer Replica Location Service Based on A Distributed Hash Table</title>
    <all_auth_inst>Min Cai (USC/ISI), Ann Chervenak (USC/ISI), Martin Frank (USC/ISI)</all_auth_inst>
    <abs>In Grids, a scalable and reliable Replica Location Service (RLS) is important for data intensive applications. In earlier work, an RLS framework was proposed and its implementation is currently available in the Globus Toolkit 3.0. In this paper, we propose a Peer-to-Peer Replica Location Service (P-RLS) with properties of self-organization, fault-tolerance and improved scalability. The P-RLS uses the Chord algorithm to self-organize P-RLS servers and exploits the Chord overlay network to replicate P-RLS mappings adaptively. Our performance measurements demonstrate that update and query latencies increase at a logarithmic rate with the size of the P-RLS network, while the overhead of maintaining the P-RLS network is reasonable. The simulation results for adaptive replication demonstrate that as the number of replicas per mapping increases, the mappings are more evenly distributed without using Chord virtual nodes. Also, predecessor replication can reduce query hotspots of extremely popular mappings by distributing queries to different nodes.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap177.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_19:51:41</time_stamp>
    <status>active</status>
    <sub_id>pap206</sub_id>
    <event_type>Paper</event_type>
    <sess_id>4</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>3:30PM</begin_time>
    <end_time>4:00PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Applications II</sess_title>
    <sess_chair></sess_chair>
    <title>Scalable line dynamics in ParaDiS</title>
    <all_auth_inst>Vasily Bulatov (Lawrence Livermore National Laboratory), Wei Cai (Lawrence Livermore National Laboratory), Jeff Fier (IBM), Masato Hiratani (Lawrence Livermore National Laboratory), Gregg Hommes (Lawrence Livermore National Laboratory), Tim Pierce (Lawrence Livermore National Laboratory), Meijie Tang (Lawrence Livermore National Laboratory), Moono Rhee (Lawrence Livermore National Laboratory), Kim Yates (Lawrence Livermore National Laboratory), Tom Arsenlis (Lawrence Livermore National Laboratory)</all_auth_inst>
    <abs>We describe an innovative highly parallel application program, ParaDiS, which computes the plastic strength of materials by tracing the evolution of dislocation lines over time. We discuss the issues of scaling the code to tens of thousands of processors, and present early scaling results of the code run on a prototype of the BlueGene/L supercomputer being developed by IBM inpartnership with the US DOE’s ASC program. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap206.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_17:04:26</time_stamp>
    <status>active</status>
    <sub_id>pap173</sub_id>
    <event_type>Paper</event_type>
    <sess_id>4</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>4:30PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Applications II</sess_title>
    <sess_chair></sess_chair>
    <title>Performance Evaluation of Parallel Large-Scale Lattice Boltzmann Applications on Three Supercomputing Architectures</title>
    <all_auth_inst>Thomas Pohl (University of Erlangen), Nils Thuerey (University of Erlangen), Frank Deserno (University of Erlangen), Ulrich Ruede (University of Erlangen), Peter Lammers (High Performance Computing Center Stuttgart (HLRS)), Gerhard Wellein (Regionales Rechenzentrum Erlangen), Thomas Zeiser (Regionales Rechenzentrum Erlangen)</all_auth_inst>
    <abs>Computationally intensive programs with moderate communication requirements such as CFD codes suffer from the standard slow interconnects of commodity &quot;off the shelf&quot; (COTS) hardware. We will introduce different large-scale applications of the Lattice Boltzmann Method (LBM) in fluid dynamics, material science, and chemical engineering and present results of the parallel performance on different architectures. It will be shown that a high speed communication network in combination with an efficient CPU is mandatory in order to achieve the required performance. An estimation of the necessary CPU count to meet the performance of 1 TFlop/s will be given as well as a prediction as to which architecture is the most suitable for LBM. Finally, ratios of costs to application performance for tailored HPC systems and COTS architectures will be presented.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap173.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-18_16:09:27</time_stamp>
    <status>active</status>
    <sub_id>pap259</sub_id>
    <event_type>Paper</event_type>
    <sess_id>53</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>11:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>Grid Services</sess_title>
    <sess_chair>David Snelling (Fujitsu)</sess_chair>
    <title>Supporting Cluster-based Network Services on Functionally Symmetric Software Architecture</title>
    <all_auth_inst>Kai Shen (University of Rochester), Lingkun Chu (University of California at Santa Barbara &amp; Ask Jeeves Inc.), Tao Yang (University of California at Santa Barbara &amp; Ask Jeeves Inc.)</all_auth_inst>
    <abs>Server and storage clustering has become a popular platform for hosting large-scale online services.  Elements of the service clustering support are often constructed using centralized or hierarchical architectures, in order to meet performance and policy objectives desired by online applications.  Functionally symmetric software architecture can enhance the robustness of cluster-based network services due to its inherent absence of vulnerability points.  However, such a design must satisfy performance requirements and policy objectives desired by online services.  This paper argues for the improved robustness of functionally symmetric architectures and presents the designs of two specific clustering support elements: energy-conserving server consolidation and service availability management.  Our emulation and experimentation on a 117-server cluster show that the proposed designs do not significantly compromise the system performance and policy objectives compared with the centralized approaches. </abs>
    <awards>Best Paper Nominee</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap259.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_06:14:50</time_stamp>
    <status>active</status>
    <sub_id>pap234</sub_id>
    <event_type>Paper</event_type>
    <sess_id>13</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>2:00PM</begin_time>
    <end_time>2:30PM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Extreme Performance</sess_title>
    <sess_chair></sess_chair>
    <title>A 15.2 TFlops Simulation of Geodynamo on the Earth Simulator</title>
    <all_auth_inst>Akira Kageyama (Earth Simulator Center, JAMSTEC), Masanori Kameyama (Earth Simulator Center, JAMSTEC), Satoru Fujihara (Earth Simulator Center, JAMSTEC), Masaki Yoshida (Earth Simulator Center, JAMSTEC), Mamoru Hyodo (Earth Simulator Center, JAMSTEC), Yoshinori Tsuda (Earth Simulator Center, JAMSTEC)</all_auth_inst>
    <abs>For realistic geodynamo simulations, one must solve the magnetohydrodynamic equations to follow time development of thermal convection motion of electrically conducting fluid in a rotating spherical shell. We have developed a new geodynamo simulation code by combining finite difference method with recently proposed spherical overset grid called Yin-Yang grid. We achieved the performance of 15.2 Tflops (46% of theoretical peak performance) by 4096 processors of the Earth Simulator.</abs>
    <awards>Gordon Bell Finalist</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap234.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_15:15:22</time_stamp>
    <status>active</status>
    <sub_id>pap213</sub_id>
    <event_type>Paper</event_type>
    <sess_id>5</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>11:00AM</begin_time>
    <end_time>11:30AM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Applications III</sess_title>
    <sess_chair></sess_chair>
    <title>GYRO:  A 5-D Gyrokinetic-Maxwell Solver</title>
    <all_auth_inst>Mark Richard Fahey (ORNL), Jeff Candy (General Atomics)</all_auth_inst>
    <abs>GYRO solves the 5-dimensional gyrokinetic-Maxwell equations in shaped plasma geometry, using either a local (flux-tube) or global radial domain.  It has been ported to a variety of modern MPP platforms including a number of commodity clusters, IBM SPs and the Cray X1.  We have been able to quickly design and analyze new physics scenarios in record time using the Cray X1: (i) transport barrier studies (Phys.  Plasmas 11(2004) 1879), (ii) the local limit of global simulations (Phys. Plasmas 11 (2004) L25), (iii) kinetic electron and finite-beta generalizations of a community-wide benchmark case, and (iv) impurity transport with application to fuel separation in burning D-T plasmas (to be submitted to Nuclear Fusion).  We will report on recent physics progress and studies.  Further, we will discuss GYRO performance across several architectures. </abs>
    <awards>Best Paper Nominee</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap213.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_01:19:44</time_stamp>
    <status>active</status>
    <sub_id>pap165</sub_id>
    <event_type>Paper</event_type>
    <sess_id>52</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>3:30PM</begin_time>
    <end_time>4:00PM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>Grid Resource Management</sess_title>
    <sess_chair>Satoshi Matsuoka (Tokyo Institute of Technology)</sess_chair>
    <title>Realistic Modeling and Synthesis of Resources for Computational Grids</title>
    <all_auth_inst>Yang-Suk Kee (Computer Science &amp; Engineering, U.C. San Diego), Henri Casanova (San Diego Supercomputing Center and Computer Science &amp; Engineering, U.C. San Diego), Andrew A. Chien (Computer Science &amp; Engineering and Center for Networked Systems, U.C. San Diego)</all_auth_inst>
    <abs>Understanding large Grid platform configurations and generating representative synthetic configurations is critical for Grid computing research. This paper presents an analysis of existing resource configurations and proposes a Grid platform generator that synthesizes realistic configurations of both computing and communication resources. Our key contributions include the development of statistical models for currently deployed resources and using these estimates for modeling the characteristics of future systems. Through the analysis of the configurations of 114 clusters and over 10,000 processors, we identify appropriate distributions for resource configuration parameters in many typical clusters. Using well-established statistical tests, we validate our models against a second resource collection of 191 clusters and over 10,000 processors, and show that our models effectively capture the resource characteristics found in real world resource infrastructures.  These models are realized in a resource generator, which can be easily recalibrated by running it on a training sample set.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap165.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_14:05:16</time_stamp>
    <status>active</status>
    <sub_id>pap241</sub_id>
    <event_type>Paper</event_type>
    <sess_id>55</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>11:00AM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Terascale Networking</sess_title>
    <sess_chair>Philip Michael  Papadopoulos (San Diego Supercomputer Center/UC San Diego)</sess_chair>
    <title>Realistic Large-Scale Online Network Simulation</title>
    <all_auth_inst>Xin Liu (UCSD), Andrew Chien (UCSD)</all_auth_inst>
    <abs>Large-scale network simulation is an important technique for studying the dynamic behavior of networks, network protocols, and emerging classes of distributed application (e.g. Grid, peer-to-peer, etc.)  Large-scale and realism are two critical requirements for network simulations of Grid application studies.  Our work here extends previous efforts in three key ways.  First, we study networks 100x larger than in our previous studies (20,000 routers).  Second, at this scale, we study realistic network structures (100 AS’s, BGP4 and OSPF routing) versus flat OSPF routing.  Finally, we describe and evaluate a new profile-based load-balancing approach called hierarchical profile-based load balance(HPROF). HPROF can improve load imbalance by 40%, and reduce the simulation time by about 50% in our 20,000 router simulations executed on 128-node clusters. In summary, these advances demonstrate that realistic large-scale network simulation for networks of 20,000 routers (comparable to large Tier-1 ISP networks like AT&amp;T) can be accomplished with our system.   </abs>
    <awards>Best Student Paper Nominee</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap241.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_17:35:39</time_stamp>
    <status>active</status>
    <sub_id>pap155</sub_id>
    <event_type>Paper</event_type>
    <sess_id>54</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>3:30PM</begin_time>
    <end_time>4:00PM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>High Through-put Grid Transport Protocols</sess_title>
    <sess_chair>shinji shimojo (osaka univ.)</sess_chair>
    <title>Experiences in the Design and Implementation of a High Performance Transport Protocol</title>
    <all_auth_inst>Yunhong Gu (University of Illinois at Chicago), Xinwei Hong (University of Illinois at Chicago), Robert L. Grossman (University of Illinois at Chicago)</all_auth_inst>
    <abs>This paper describes our experiences in the development of the UDP-based Data Transport (UDT) protocol, an application level transport protocol used in distributed data intensive applications. The new protocol is motivated by the emergence of wide area high-speed optical networks, in which TCP is often found to fail to utilize the abundant bandwidth.

UDT demonstrates good efficiency and fairness (including RTT fairness and TCP friendliness) characteristics in high performance computing applications where a small number of bulk sources share the abundant bandwidth. It combines both rate and window control and uses bandwidth estimation to determine the control parameters automatically. This paper presents the rationale behind UDT: how UDT integrates these schemes to support high performance data transfer, why these schemes are used, and what the main issues are in the design and implementation of this high performance transport protocol. </abs>
    <awards>Best Student Paper Nominee</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap155.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_00:25:48</time_stamp>
    <status>active</status>
    <sub_id>pap254</sub_id>
    <event_type>Paper</event_type>
    <sess_id>54</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>4:30PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>High Through-put Grid Transport Protocols</sess_title>
    <sess_chair>shinji shimojo (osaka univ.)</sess_chair>
    <title>Inter-layer coordination for parallel TCP streams on Long Fat Pipe Networks</title>
    <all_auth_inst>Hiroyuki Kamezawa (Fujitsu), Makoto Nakamura (University of Tokyo), Junji Tamatsukuri (University of Tokyo), Nao Aoshima (University of Tokyo), Mary Inaba (University of Tokyo), Kei Hiraki (University of Tokyo), Junichiro Shitami (Fujitsu Lab.), Akira Jinzaki (Fujitsu Lab.), Ryutaro Kurusu (Fujitsu Computer Technologies), Masakazu Sakamoto (Fujitsu Computer Technologies), Yukichi Ikuta (Fujitsu Computer Technologies)</all_auth_inst>
    <abs>It is well known that performance of TCP is low while transferring huge data over Long Fat pipe Network (LFN). Many algorithms have been proposed, which speeds up growth of window size. However, slow growth of window size is not only the problem. Instable dispersion of the performance of streams is another problem. To improve the current TCP performance, we propose 3 methods, (1) Comet-TCP,  (2) “Transmission Rate Controlled TCP(TRC-TCP)”, cooperation of datalink layer and transport layer using software with ordinary network card, and (3) “Dulling Edges of Cooperative Parallel streams (DECP)”, an external scheduler to balance the performance of parallel streams. We show the experimental results of file transfer at Bandwidth Challenge in SC2003; from Japan to U.S., 15,000 miles. Comet-TCP hardware solution attained max 7.56 Gbps using a pair of 16 IA servers, and DECP software attained max 7.01 Gbps using a pair of 32 IA servers.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap254.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-21_10:28:18</time_stamp>
    <status>active</status>
    <sub_id>pap112</sub_id>
    <event_type>Paper</event_type>
    <sess_id>5</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>11:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Applications III</sess_title>
    <sess_chair></sess_chair>
    <title>Advanced Computational Fluid Dynamics Simulations of Projectiles with Flow Control</title>
    <all_auth_inst>Jubaraj Sahu (U.S. Army Research Laboratory), Karen R. Heavey (U.S. Army Research Laboratory)</all_auth_inst>
    <abs>This paper describes a computational study undertaken as part of a grand challenge project to determine the aerodynamic effect of flow control in the afterbody regions at subsonic and supersonic speeds using an advanced scalable unstructured flow solver.  High parallel efficiency is achieved for both steady and time-accurate unsteady flow field simulations using advanced scalable Navier-Stokes computational techniques on SGI Origin 3000, IBM SP3, and Linux Cluster.  Numerical simulations with the unsteady synthetic jet show the jets to substantially alter the flow field both near the jet and the base region of the projectile that in turn affects the forces and moments even at zero degree angle of attack.  The results have shown the potential of high performance computing computational fluid dynamics simulations on parallel machines to provide insight into the jet interaction flow fields leading to improved projectile designs and accurate prediction of flight trajectories.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap112.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>pap143</sub_id>
    <event_type>Paper</event_type>
    <sess_id>11</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>2:30PM</begin_time>
    <end_time>3:00PM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Architectural Paradigms</sess_title>
    <sess_chair>William Carlson (IDA Center for Computing Sciences)</sess_chair>
    <title>Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs</title>
    <all_auth_inst>Thomas Sterling (Caltech), Jay Brockman (Notre Dame)</all_auth_inst>
    <abs>Processor in Memory or PIM architecture offers dramatic improvements in performance for computations that exhibit poor locality. PIM provides high memory bandwidth and low access latency on-chip. Future PIMs may incorporate more nodes, multithreading for local latency hiding, and lightweight message-driven computing to tolerate system-wide latencies. This paper describes a series of queuing simulation experiments and analytical studies using statistical steady-state parametric models to evaluate the design tradeoff space of these advanced concepts in PIM. The results show a range of improvements as a function of structural and operational parameters. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap143.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>pap283</sub_id>
    <event_type>Paper</event_type>
    <sess_id>7</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>11:00AM</begin_time>
    <end_time>11:30AM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>File Systems</sess_title>
    <sess_chair>John L Cole (US Army Research Laboratory)</sess_chair>
    <title>A Self-Organizing Storage Cluster for Parallel Data-Intensive Applications</title>
    <all_auth_inst>Hong Tang (UC, Santa Barbara), Aziz Gulbeden (UC, Santa Barbara), Jingyu Zhou (UC, Santa Barbara), William Strathearn (UC, Santa Barbara), Tao Yang (UC, Santa Barbara), Lingkun Chu (UC, Santa Barbara)</all_auth_inst>
    <abs>Cluster-based storage systems are popular for data-intensive applications and it is desirable yet challenging to provide incremental expansion and high availability while achieving scalability and strong consistency.  This paper presents the design and implementation of a self-organizing storage cluster called &quot;Sorrento&quot;, which targets data-intensive workload with highly parallel requests and low write-sharing patterns.  Sorrento automatically adapts to storage node joins and departures, and the system can be configured and maintained incrementally without interrupting its normal operation.  Data location information is distributed across storage nodes using consistent hashing and the location protocol differentiates small and large data objects for access efficiency.  It adopts versioning to achieve single-file serializability and replication consistency.  In this paper, we present experimental results to demonstrate features and performance of Sorrento using microbenchmarks, application benchmarks, and application trace replay.   </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap283.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>pap217</sub_id>
    <event_type>Paper</event_type>
    <sess_id>53</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>11:00AM</begin_time>
    <end_time>11:30AM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>Grid Services</sess_title>
    <sess_chair>David Snelling (Fujitsu)</sess_chair>
    <title>Towards Flexible Messaging for SOAP Based Services</title>
    <all_auth_inst>Geoffrey Fox (Community Grids Lab, Indiana University), Shrideep Pallickara (Community Grids Lab, Indiana University), Savas Parastatidis (School of Computing Science, University of Newcastle)</all_auth_inst>
    <abs>NaradaBrokering has been developed as the messaging infrastructure for collaboration, peer-to-peer and Grid applications. It has undergone extensive functional testing in collaborative sessions and extensive performance measurements have been made in a variety of configurations including cross-continental applications. The value of NaradaBrokering in the context of Grid and Web services has been clear for some time. NaradaBrokering provides a messaging abstraction that allows the system to provide message-related capabilities in a transparent fashion. These capabilities include message-based security, time and causal ordering, compression, virtualization of transport protocol and addressing, and fault tolerance related functionalities. NaradaBrokering – combined with further extensions to its existing capabilities – can also take advantage of the maturing of Web Service specifications to build very powerful general mechanisms to deploy and integrate it with general Web services. Here we describe our approach to exploiting the SOAP processing stack to interface NaradaBrokering with Web services.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap217.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_13:06:56</time_stamp>
    <status>active</status>
    <sub_id>pap305</sub_id>
    <event_type>Paper</event_type>
    <sess_id>53</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>11:00AM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>Grid Services</sess_title>
    <sess_chair>David Snelling (Fujitsu)</sess_chair>
    <title>VMPlants: Providing and Managing Virtual Machine Execution Environments for Grid Computing</title>
    <all_auth_inst>Ivan Victor Krsul (Advanced Computing and Information Systems (ACIS) Lab, University of Florida), Arijit Ganguly (Advanced Computing and Information Systems (ACIS) Lab, University of Florida), Jian Zhang (Advanced Computing and Information Systems (ACIS) Lab, University of Florida), Jose A. B. Fortes (Advanced Computing and Information Systems (ACIS) Lab, University of Florida), Renato J. Figueiredo (Advanced Computing and Information Systems (ACIS) Lab, University of Florida)</all_auth_inst>
    <abs>Virtual machines (VMs) provide flexible, powerful execution environments for Grid computing, offering isolation and security mechanisms complementary to operating systems, customization and encapsulation of entire application environments, and support for legacy applications. This paper describes a Grid service – VMPlant – that provides for automated configuration and creation of flexible VMs that, once configured to meet application needs, can then subsequently be copied (“cloned”) and dynamically instantiated to provide homogeneous execution environments across distributed Grid resources. In combination with complementary middleware for user, data and resource management, the functionality enabled by VMPlant allows for problem-solving environments to deliver Grid applications to users with unprecedented flexibility. VMPlant supports a graph-based model for the definition of customized VM configuration actions; partial graph matching, VM state storage and “cloning” for efficient creation. This paper presents the VMPlant architecture, describes a prototype implementation of the service, and presents an analysis of its performance.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap305.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_19:56:12</time_stamp>
    <status>active</status>
    <sub_id>pap287</sub_id>
    <event_type>Paper</event_type>
    <sess_id>54</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>4:00PM</begin_time>
    <end_time>4:30PM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>High Through-put Grid Transport Protocols</sess_title>
    <sess_chair>shinji shimojo (osaka univ.)</sess_chair>
    <title>Improving Throughput for Grid Applications with Network Logistics</title>
    <all_auth_inst>Martin Swany (University of Delaware)</all_auth_inst>
    <abs>This work describes a technique for improving network performance in Grid environments that we refer to as ``logistics.''  We demonstrate that by using storage and cooperative forwarding ``in'' the network, we can improve end to end throughput in many cases.  Our approach uses TCP connections in series and offers performance benefits for high-bandwidth, high-latency networks.  First, we examine the underlying causes of the logistical effect. Next, we present a graph based scheduling approach that can be solved quickly and, within our assumptions, optimally.  Finally, we present a large-scale empirical evaluation of the system in order to validate our scheduling approach for taking advantage of network logistics.  This study demonstrates performance improvement in many situations and aggregate speedup  results as well as specific instances are presented.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap287.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_00:25:01</time_stamp>
    <status>active</status>
    <sub_id>pap281</sub_id>
    <event_type>Paper</event_type>
    <sess_id>55</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>11:00AM</begin_time>
    <end_time>11:30AM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Terascale Networking</sess_title>
    <sess_chair>Philip Michael  Papadopoulos (San Diego Supercomputer Center/UC San Diego)</sess_chair>
    <title>Collaborative User-centric Lambda-Grid over Wavelength-Routed Network</title>
    <all_auth_inst>Oliver T. Yu (University of Illinois at Chicago), Thomas A. DeFanti (University of Illinois at Chicago)</all_auth_inst>
    <abs>Emerging lambda-Grid systems employ wavelength-routed network with optical switches to enable dynamic on-demand lightpaths with multi-gigabit rate bandwidth to interconnect shared computing clusters of user domains. User-centric lambda-Grid systems enable user domains to act as distributed connectivity providers of shared wavelength resources; and as distributed network control service providers of brokering shared wavelength resources and of provisioning scheduled lightpaths. To support user-centric lambda-Grid systems, the proposed Dynamic User-centric Switched Optical Network (DUSON) architecture employs an Intelligent Control Service Proxy (ICSP) at each user domain to support wavelength connectivity brokerage and lightpath provisioning/restoration middleware for user applications to control on-demand lightpaths. Furthermore, ICSP employs the proposed Robust Fast Optical Reservation Protocol (RFORP) during lightpath provisioning/restoration to minimize wavelength reservation blocking and delay via localized rerouting and parallel concurrent wavelength reservation processing respectively.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap281.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_21:21:59</time_stamp>
    <status>active</status>
    <sub_id>pap198</sub_id>
    <event_type>Paper</event_type>
    <sess_id>52</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>4:00PM</begin_time>
    <end_time>4:30PM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>Grid Resource Management</sess_title>
    <sess_chair>Satoshi Matsuoka (Tokyo Institute of Technology)</sess_chair>
    <title>The Inca Test Harness and Reporting Framework</title>
    <all_auth_inst>Shava Smallen (San Diego Supercomputer Center), Catherine Olschanowsky (San Diego Supercomputer Center), Kate Ericson (San Diego Supercomputer Center), Pete Beckman (Argonne National Laboratory), Jennifer Schopf (Argonne National Laboratory)</all_auth_inst>
    <abs>Virtual organizations (VOs), communities that enable coordinated resource sharing among multiple sites, are becoming more prevalent in the high-performance computing community.  In order to promote cross-site resource usability, most VOs prepare service agreements that include a minimum set of common resource functionality, starting with a common software stack and evolving into more complicated service and interoperability agreements.  VO service agreements are often difficult to verify and maintain, however, because the sites are dynamic and autonomous. Automated verification of service agreements is critical: manual and user tests are not practical on a large scale.

The Inca test harness and reporting framework is a generic system for the automated testing, data collection, verification, and monitoring of service agreements.  This paper describes Inca's architecture, system impact, and performance. Inca is being used by the TeraGrid project to verify software installations, monitor service availability, and collect performance data.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap198.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-16_08:59:44</time_stamp>
    <status>active</status>
    <sub_id>pap290</sub_id>
    <event_type>Paper</event_type>
    <sess_id>39</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>4:00PM</begin_time>
    <end_time>4:30PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Performance Evaluation Algorithms</sess_title>
    <sess_chair>Henri Casanova (UCSD)</sess_chair>
    <title>Performance evaluation of task pools based on hardware synchronization</title>
    <all_auth_inst>Ralf Hoffmann (University of Bayreuth, Germany), Matthias Korch (University of Bayreuth, Germany), Thomas Rauber (University of Bayreuth, Germany)</all_auth_inst>
    <abs>A task-based execution provides a universal approach to dynamic load balancing for irregular applications.  Tasks are arbitrary units of work that are created dynamically at runtime and that are stored in a parallel data structure, the task pool, until they are scheduled onto a processor for execution. In this paper, we evaluate the performance of different task pool implementations for shared-memory computer systems using several realistic applications. We consider task pools with different data structures, different load balancing strategies and a specialized memory management.  In particular, we use synchronization operations based on hardware support that is available on many modern microprocessors.  We show that the resulting task pool implementations lead to a much better performance than implementations using Pthreads library calls for synchronization.  The applications considered are parallel quicksort, volume rendering, ray tracing, and hierarchical radiosity. The target machines are an IBM p690 server and a SunFire 6800. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap290.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_14:37:16</time_stamp>
    <status>active</status>
    <sub_id>pap293</sub_id>
    <event_type>Paper</event_type>
    <sess_id>49</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>2:00PM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>Compiler Technology</sess_title>
    <sess_chair>Mary Hall (USC/ISI)</sess_chair>
    <title>The Limit of Computation Reordering for Improving Locality</title>
    <all_auth_inst>Chen Ding (University of Rochester), Maksim Orlovich (University of Rochester)</all_auth_inst>
    <abs>Maximizing program locality, which becomes increasingly important on   modern computer systems, requires reordering program computations.   While there exist many effective techniques such as loop   interchange, tiling, and fusion, it is unclear how much room exists   for further improvement, especially in programs with complex data   and control structures.  This paper presents a limit study on the   temporal locality of sequential programs.  First, it shows that   maximizing the locality is different from maximizing the parallelism   or maximizing the cache utilization.  Then it describes a tool that   measures the potential of computation reordering.  Compared to those   used in the past limit studies, the new tool is unique because it   measures the exact control dependences, applies complete memory   renaming, and uses constrained computation regrouping.  The tool is   used to measure the potential of locality improvement in a set of   classic numerical and integer problems.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap293.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_16:28:23</time_stamp>
    <status>active</status>
    <sub_id>pap249</sub_id>
    <event_type>Paper</event_type>
    <sess_id>37</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>2:00PM</begin_time>
    <end_time>2:30PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Scheduling Algorithms</sess_title>
    <sess_chair>Yves Robert (ENS Lyon)</sess_chair>
    <title>Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids</title>
    <all_auth_inst>Derrick Kondo (UCSD), Andrew A. Chien (UCSD), Henri Casanova (SDSC, UCSD)</all_auth_inst>
    <abs>Desktop grids are popular platforms for high throughput applications, but due their inherent resource volatility it is difficult to exploit them for applications that require rapid turnaround.  Efficient desktop grid execution of short-lived applications is an attractive proposition and we claim that it is achievable via intelligent resource selection.  We propose three general techniques for resource selection: resource prioritization, resource exclusion, and task duplication. We use these techniques to instantiate several scheduling heuristics.  We evaluate these heuristics through trace-driven simulations of four representative desktop grid configurations.  We find that ranking desktop resources according to their clock rates, without taking into account their availability history, is surprisingly effective in practice. Our main result is that a heuristic that uses the appropriate combination of resource prioritization, resource exclusion, and task replication achieves performance within a factor of 1.7 of optimal.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap249.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_23:13:49</time_stamp>
    <status>active</status>
    <sub_id>pap231</sub_id>
    <event_type>Paper</event_type>
    <sess_id>10</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>3:30PM</begin_time>
    <end_time>4:00PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Emerging Architectures</sess_title>
    <sess_chair>Jose L Munoz (NSF)</sess_chair>
    <title>QCDOC:  A 10 Teraflops Computer for Tightly-coupled Calculations</title>
    <all_auth_inst>P.A. Boyle (University of Edinburgh and Columbia University), Dong Chen (IBM T.J.Watson Research Laboratory), Norman H. Christ (Columbia University), Mike Clark (University of Edinburgh and Columbia University), Saul Cohen (Columbia University), Zhihua Dong (Columbia University), Alan Gara (IBM T.J. Watson Research Laboratory), Balint Joo (University of Edinburgh), Chulwoo Jung (Brookhaven National Lab and Columbia University), Changhoan Kim (Columbia University), Ludmila Levkova (Columbia University), Xiaodong Liao (Columbia University), Guofeng Liu (Columbia University), Robert D. Mawhinney (Columbia University), Shigemi Ohta (KEK Japan and the RBRC, Brookhaven National Lab), Konstantin Petrov (Brookhaven National Lab and Columbia Univ.), Tilo Wettig (University of Regensburg), Azusa Yamaguchi (Columbia University), Calin Cristian (Columbia University)</all_auth_inst>
    <abs>Numerical simulations of the strong nuclear force, known as quantum chromodynamics or QCD, have proven to be a demanding, forefront problem in high-performance computing.  In this report, we describe a new computer, QCDOC (QCD On a Chip), designed for optimal price/performance in the study of QCD.  QCDOC uses a six-dimensional, low-latency mesh network to connect processing nodes, each of which includes a single custom ASIC, designed by our collaboration and built by IBM, plus DDR SDRAM.  Each node has a peak speed of 1 Gigaflops and two 12,000 node, 10+ Teraflops machines are to be completed in the fall of 2004.  Currently, a 512 node machine is running, delivering efficiencies as high as 45\% of peak on the conjugate gradient solvers that dominate our calculations and a 4096-node machine with a cost of \$1.6M is under construction. This should give us a price/performance less than \$1 per sustained Megaflops. </abs>
    <awards>Gordon Bell Finalist</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap231.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_17:47:49</time_stamp>
    <status>active</status>
    <sub_id>pap207</sub_id>
    <event_type>Paper</event_type>
    <sess_id>7</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>11:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>File Systems</sess_title>
    <sess_chair>John L Cole (US Army Research Laboratory)</sess_chair>
    <title>The Panasas Storage Cluster - Delivering Scalable High Bandwidth Storage</title>
    <all_auth_inst>David Nagle (Panasas), Denis Serenyi (Panasas), Abbie Matthews (Panasas)</all_auth_inst>
    <abs>Fundamental advances in high-level storage architectures and low-level storage-device interfaces greatly improve the performance and scalability of storage systems. Specifically, the decoupling of storage control (i.e., file system policy) from datapath operations (i.e., read, write) allows client applications to leverage the readily available bandwidth of storage devices while continuing to rely on the rich semantics of today’s file systems.  Further, the evolution of storage interfaces from block-based devices with no protection to object-based devices with per-command access control enables storage to become secure, first-class IP-based network citizens. This paper examines how the Panasas Storage Cluster leverages distributed storage and object-based devices to achieve linear scalability of storage bandwidth.  Specifically, we focus on implementation issues with our Object-based Storage Device, aggregation algorithms across the collection of OSDs, and the close coupling of networking and storage to achieve scalability.  </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap207.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_23:54:37</time_stamp>
    <status>active</status>
    <sub_id>pap302</sub_id>
    <event_type>Paper</event_type>
    <sess_id>10</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>4:00PM</begin_time>
    <end_time>4:30PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Emerging Architectures</sess_title>
    <sess_chair>Jose L Munoz (NSF)</sess_chair>
    <title>A Performance and Scalability Analysis of the BlueGene/L Architecture </title>
    <all_auth_inst>Kei Davis (LANL), Adolfy Hoisie (LANL), Greg Johnson (LANL), Darren J. Kerbyson (LANL), Mike Lang (LANL), Scott Pakin (LANL), Fabrizio Petrini (LANL)</all_auth_inst>
    <abs>Based on a set of measurements done on the 512-node 500MHz prototype and early results on a 2048 node 700MHz BlueGene/L machine at IBM Watson, we present a performance and scalability analysis of the architecture from low-level characteristics to large-scale applications. 

In addition, we present performance predictions using our models for the performance of two representative applications from the ASC workload on the full BlueGene/L configuration of 64K nodes. We have compared the measured values for several of the benchmarks in our suite against the predicted numbers from our performance models. In general, the error bars were relatively low. A comparison between the performance of BlueGene/L and the ASCI Q, the largest machine in the US, is presented, also based on our predictive performance models.

</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap302.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_12:00:02</time_stamp>
    <status>active</status>
    <sub_id>pap111</sub_id>
    <event_type>Paper</event_type>
    <sess_id>13</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>2:00PM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Extreme Performance</sess_title>
    <sess_chair></sess_chair>
    <title>Ultrascalable implicit finite element analyses in solid mechanics with over a half a billion degrees of freedom</title>
    <all_auth_inst>Mark Francis Adams (Sandia National Laboratories), Harun H. Bayraktar (Abaqus Corporation), Tony M. Keaveny (U. C. Berkeley), Panayiotis Papadopoulos (U. C. Berkeley)</all_auth_inst>
    <abs>The solution of elliptic diffusion operators is the computational bottleneck in many simulations in a wide range of engineering and scientific disciplines.  We present a truly scalable - ultrascalable -linear solver for the diffusion operator in unstructured elasticity problems.  Scalability is demonstrated with speedup studies of a non-linear analyses of a vertebral body with over a half of a billion degrees of freedom on up to 4088 processors on the ACSI White machine.  This work is significant because in the domain of unstructured implicit finite element analysis in solid mechanics with complex geometry, this is the first demonstration of a highly parallel and efficient application of a mathematically optimal linear solution method on a common large scale computing platform - the IBM SP Power3.</abs>
    <awards>Gordon Bell Finalist</awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap111.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>pap113</sub_id>
    <event_type>Paper</event_type>
    <sess_id>12</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>11:00AM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Advanced Hardware Features</sess_title>
    <sess_chair>Thomas Wingfield Page (NSA)</sess_chair>
    <title>Will Moore's Law be Sufficient?</title>
    <all_auth_inst>Erik P. DeBenedictis (Sandia National Laboratories)</all_auth_inst>
    <abs>It seems widely believed that Moore's Law will indefinitely power supercomputer simulation as an enabler for scientific discoveries, weapons, and other activities of value to society. This paper seeks to add detail to these arguments, revealing them to be generally correct but not a smooth and effortless progression.

This paper begins by reviewing some key problems that can be solved with supercomputer simulation, including their FLOPs count and general nature. The paper will the review work by others showing that the theoretical maximum supercomputer power is very high indeed, but will explain how a straightforward extrapolation of Moore’s Law will lead to technological maturity in a few decades. Finally, the paper shows how to evaluate the performance of supercomputer of both current and future designs in order to determine sufficiency for a particular application.

</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name>pap113.pdf</file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:28:29</time_stamp>
    <status>active</status>
    <sub_id>tut133</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>56</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S8:  State of InfiniBand in Designing HPC Clusters, Storage/File Systems, and Datacenters</sess_title>
    <sess_chair>Dhabaleswar K. Panda (The Ohio State University)</sess_chair>
    <title>State of InfiniBand in Designing  HPC Clusters, Storage/File Systems, and Datacenters</title>
    <all_auth_inst>Dhabaleswar K. (DK) Panda (The Ohio State University)</all_auth_inst>
    <abs>InfiniBand Architecture (IBA) is generating a lot of excitement toward building next generation HPC clusters, Servers, Storage/File systems, and Datacenters in a radical different manner. This is leading to the following questions among many scientists, engineers, managers, developers, and users of high-end systems: 1) What is IBA?  2) How is it different from other interfaces and interconnects (PCI-X, PCI-Express, 10.0 GigE, Myrinet, Quadrics, etc.)?  3) Available IBA hardware/software solutions and their trends, and 4) How one can take advantage of IBA features to design next generation high-end systems?

This tutorial will provide answers to the above questions in an in-depth manner.  We will start with the motivation behind IBA and a brief overview of its architectural aspects.  An in-depth comparison will be done with other interfaces and interconnects.  IBA hardware/software solutions and the market trends will be highlighted. Finally, case studies outlining the challenges and experiences in designing HPC clusters (MPI-1 and MPI-2), Distributed Shared Memory systems (TreadMarks, HLRC, ARMCI), File systems (DAFS, PVFS, Lustre), Storage protocols (SRP, iSCSI), Database systems (Oracle 9i RAC), and Multi-tier Datacenters with the novel features of IBA will be presented. These case studies will highlight the associated performance numbers/comparisons and open research issues.</abs>
    <awards></awards>
    <intro_level>20</intro_level>
    <inter_level>40</inter_level>
    <adv_level>40</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:26:56</time_stamp>
    <status>active</status>
    <sub_id>tut130</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>16</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S1:  Advanced MPI - A Hands-on Tutorial</sess_title>
    <sess_chair>Rob Ross (MCS/ANL)</sess_chair>
    <title>Advanced MPI: I/O and One-Sided Communication</title>
    <all_auth_inst>Robert Ross (ANL), Ewing Lusk (ANL), William Gropp (ANL), Rajeev Thakur (ANL)</all_auth_inst>
    <abs>This tutorial is about advanced use of MPI, in particular the MPI-IO and one-sided communication features that were added to MPI (Message-Passing Interface) by the second MPI Forum called MPI-2.  Implementations of MPI-2 (or significant subsets thereof) are now available both from vendors and from open-source projects. For example, the one-sided communication functions of MPI-2 are being used successfully in applications running on the Earth Simulator. In other words, MPI-2 can now really be used in practice.

During the day we will cover two major components of MPI-2: parallel I/O and one-sided communication.  The tutorial will be heavily example driven, using one projector for tutorial material and a second projector to display code and demonstration runs of examples.  Attendees will have the opportunity to see codes using these advanced concepts built, run, and modified during the tutorial.

Attendees will leave the tutorial with both an understanding of these advanced concepts and a collection of working example codes that they are familiar with and have seen run and modified.  This will prepare them for applying these concepts in their own applications. </abs>
    <awards></awards>
    <intro_level>10</intro_level>
    <inter_level>50</inter_level>
    <adv_level>40</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:29:29</time_stamp>
    <status>active</status>
    <sub_id>tut120</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>44</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M10: Methods for Performance Engineering of Scientific Applications</sess_title>
    <sess_chair>Allan Edward Snavely (San Diego Supercomputer Center)</sess_chair>
    <title>Methods for Performance Engineering of Scientific Applications</title>
    <all_auth_inst>Allan Edward Snavely (SDSC), Celso Mendes (UIUC), Bronis de Supinski (LLNL), Paul Hovland (Argonne), David H. (Bailey)</all_auth_inst>
    <abs>This tutorial presents methodology to improve the performance of scientific applications. Attendees will learn to model and understand performance to reveal opportunity for improving application performance by tuning and/or choosing the target machine. The process consists of assessing computational demands, capability and complexity of the application code, as well as understanding the efficiency of the mapping of the application to architectures. We walk through a series of common performance-related issues such as instruction mix, memory and file system access patterns, nature and type of communications, and data and work distribution, for which performance models can guide optimizations. Tools from PERC, the Department of Energy Office of Science’s Performance Evaluation Research Center, are used to explore these issues and guide performance improvements. We draw exercises from real-life scientific applications in use by the community and to exemplify these issues and to demonstrate optimization guidance. We target an audience primarily of application developers who need to quantify and improve the performance of their codes. Our tools are also of interest to system designers, administrators and integrators looking to monitor and maximize time-to-solution. Attendees should be familiar with at least one scientific application, parallel programming environment and HPC platform.</abs>
    <awards></awards>
    <intro_level>25</intro_level>
    <inter_level>50</inter_level>
    <adv_level>25</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:33:09</time_stamp>
    <status>active</status>
    <sub_id>tut140</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>23</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M7: Cybersecurity at Open Scientific Facilities</sess_title>
    <sess_chair>Vern Paxson (ICSI / LBNL)</sess_chair>
    <title>Cybersecurity at Open Scientific Facilities</title>
    <all_auth_inst>Vern Paxson (ICSI / LBNL), James Rothfuss (LBNL), Stephen Lau (LBNL), William Kramer (LBNL)</all_auth_inst>
    <abs>The ability for scientists to collaborate unfettered over networks is critical for high performance computational (HPC) environments. This need however is tempered by the realities of today's interconnected computational environments, where protection from unauthorized access and usage is a necessity. How does one find an effective balance between the needs of an open scientific research facility and simultaneously protecting the site from attacks? What challenges lie ahead in high performance security?

This tutorial addresses these questions by exploring various topics of computer security as they relate to open, high-performance computer facilities. Some of the topics we will address are:

1) The unique nature and demands within an HPC environment 2) Addressing the needs of computer protection in an HPC environment 3) An overview of current trends in attacks and incidents 4) Intrusion detection in an HPC environment 5) The future of high performance computing protection

SciNet – SC 2004’s network -- itself resembles networks at open scientific facilities. Some of the tools deployed at open scientific facilities are also deployed at SC for computer protection. We will show real network attack statistics collected at SC04 and explicate how the techniques described in the tutorial are in use at SC.</abs>
    <awards></awards>
    <intro_level>10</intro_level>
    <inter_level>50</inter_level>
    <adv_level>40</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:31:01</time_stamp>
    <status>active</status>
    <sub_id>tut122</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>47</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M13: Constructing Advanced Storage Networks</sess_title>
    <sess_chair>Joseph Pelissier (McDATA Corporation)</sess_chair>
    <title>Constructing Advanced Storage Networks: Integrating Virtual Fabrics, SAN Routing, and IP Based Storage</title>
    <all_auth_inst>Joseph Edward Pelissier (McDATA Corporation)</all_auth_inst>
    <abs>Storage area networks are continuing to experience phenomenal growth.  To accommodate this growth, a variety of new technologies are being introduced to address the various manageability, scalability, and cost requirements of future SAN infrastructures.  Among these technologies are partitionable and multiprotocol switches, virtual fabrics and networks, IP storage technologies, and a large variety of gateway capabilities.  This tutorial will provide participants with a working knowledge of these technologies, their capabilities and limitations, sufficient to begin to plan and design future advanced storage networks.</abs>
    <awards></awards>
    <intro_level>20</intro_level>
    <inter_level>60</inter_level>
    <adv_level>20</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:32:57</time_stamp>
    <status>active</status>
    <sub_id>tut134</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>21</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M6: High Performance Data Transfer</sess_title>
    <sess_chair>Phillip Dykstra (WareOnEarth Communications)</sess_chair>
    <title>High Performance Data Transfer</title>
    <all_auth_inst>Phillip Dykstra (WareOnEarth Communications)</all_auth_inst>
    <abs>High Performance Data Transfer is a core requirement of many Supercomputing applications.  From basic FTP file transfers to P2P or Grid applications, moving data across LANs and WANs and high speed is critically important.  This tutorial will cover a wide range of approaches used to achieve this, many of which are complimentary, including tuning end systems and networks, new or improved protocols, parallel transfers, and abstract data storage.

This tutorial covers the whys and hows of high performance data transfer.  The first half focuses on advanced networking technology and low-level performance issues such as delay, loss, switching/routing, and TCP and UDP dynamics. The second half looks at higher level approaches to improving performance, from improved protocols, parallel transfers, peer-to-peer and grid techniques, and abstract storage services.

The attendee should come away with a detailed understanding of data transfer over wide area networks, and exposure to a great number of tools and utilities to tune, debug, and improve their ability to move data at high speed. </abs>
    <awards></awards>
    <intro_level>25</intro_level>
    <inter_level>50</inter_level>
    <adv_level>25</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:27:48</time_stamp>
    <status>active</status>
    <sub_id>tut139</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>20</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S5: Practical Application Performance Analysis on Linux Systems</sess_title>
    <sess_chair>John Mellor-Crummey (Rice University)</sess_chair>
    <title>Practical Application Performance Analysis on Linux Systems</title>
    <all_auth_inst>John M. Mellor-Crummey (Rice University), Robert J. Fowler (Rice University), Nathan Tallent (Rice University)</all_auth_inst>
    <abs>This tutorial will teach attendees how to analyze the node performance of both serial and parallel applications on computer systems running the Linux operating system. The tutorial is intended for a broad audience including both computer and computational scientists who are interested in tuning the performance of middleware and applications. The morning session will begin by reviewing aspects of computer system organization relevant to application performance and describing architectural support for monitoring machine performance that is available in today’s computer systems. Most of the morning session will focus on how to use HPCToolkit – a collection of multi-platform performance tools developed at Rice University – to analyze application performance using data collected from hardware performance counters. HPCToolkit enables users to collect a variety of performance metrics for unmodified application binaries, correlate measurements with applications at multiple levels, synthesize performance databases containing multiple metrics, and browse the performance databases in a top-down fashion to pinpoint performance bottlenecks.  In the afternoon, attendees will use their laptops to access a remote system to run HPCToolkit on provided program samples or their own codes. The tutorial will conclude with a presentation of how attendees can download and build HPCToolkit on their own computer systems.</abs>
    <awards></awards>
    <intro_level>20</intro_level>
    <inter_level>60</inter_level>
    <adv_level>20</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:30:08</time_stamp>
    <status>active</status>
    <sub_id>tut138</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>42</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M8: UPC: Unified Parallel C</sess_title>
    <sess_chair>Tarek El-Ghazawi (George Washington University)</sess_chair>
    <title>UPC: Unified Parallel C</title>
    <all_auth_inst>William Carlson (IDA Center for Computing Sciences), Tarek El-Ghazawi (High Performance Computing Laboratory - ECE Dept - The George Washington University)</all_auth_inst>
    <abs>UPC, or Unified Parallel C, is a parallel extension of ISO C.  UPC follows the Partitioned Global Address Space (PGAS) programming model, which is aimed at leveraging the ease of programming of the shared memory paradigm, while enabling the exploitation of data locality.  To this end, UPC incorporates constructs that allow placing data near the threads that manipulate them to minimize remote accesses.  UPC has also many advanced synchronization features including mechanisms for overlapping synchronization with local processing and constructs for defining memory consistency. UPC is the effort of a consortium of universities, government and industry.   It has been receiving rising attention from programmers and vendors and is now a product available on the Cray X1 and HP parallel computers.  Open Source compilers are available for most other platforms.  The TotalView (tm) debugger is also available.</abs>
    <awards></awards>
    <intro_level>10</intro_level>
    <inter_level>40</inter_level>
    <adv_level>50</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:24:55</time_stamp>
    <status>active</status>
    <sub_id>tut128</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>25</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S10: High Performance Computing in Python</sess_title>
    <sess_chair>Åsmund Ødegård (it manager)</sess_chair>
    <title>High Performance Computing in Python</title>
    <all_auth_inst>Hans Petter Langtangen (Professor), Are Magnus Bruaset (Simula Research Laboratory), Kent-Andre Mardal (Simula Research Laboratory), Patrick Miller (Lawrence Livermore National Laboratory), Halvard Moe (Simula Research Laboratory), Ola Skavhaug (Simula Research Laboratory), Åsmund Ødegård (Simula Research Laboratory)</all_auth_inst>
    <abs>The virtue of Python is flexibility, making it an ideal tool for the scientist who wants to experiment with different models, do rapid prototyping, and even do large scale simulation. This tutorial presents concepts and software that will give the scientist adaptable and powerful tools.

The tutorial begins with the basic concepts of using Python for scientific and high performance computing.  We cover the use of the Python Numeric module, threads, and pyMPI for SPMD style distributed parallelism.  We also discuss performance optimization of Python code, and Python tools for visualization. The latter includes how to make simple movies as well as interfaces to complex visualization programs. The first part ends with an in-depth introduction to C and FORTRAN code wrapping, allowing applications originally written in these languages to be run from Python. We also cover how to increase computational performance by extending Python with C and FORTRAN modules.

The last part is a hands-on session. We equip a FORTRAN application with a Python interface, replace the main loop with a Python program, and visualize results. We also cover more details on C and FORTRAN extensions. 

The audience work on their own laptops. Software/info both at http://www.simula.no/projects/sc2004 and on CD.</abs>
    <awards></awards>
    <intro_level>10</intro_level>
    <inter_level>50</inter_level>
    <adv_level>40</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:27:06</time_stamp>
    <status>active</status>
    <sub_id>tut129</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>17</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S2: Parallel Computing 101</sess_title>
    <sess_chair>Quentin Fielden Stout (University of Michigan)</sess_chair>
    <title>Parallel Computing 101</title>
    <all_auth_inst>Quentin F. Stout (Unversity of Michigan), Christiane Jablonowski (NCAR)</all_auth_inst>
    <abs>This tutorial provides a comprehensive overview of parallel computing, emphasizing those aspects most relevant to the user. It is suitable for new or prospective users, managers, students and anyone seeking a general overview of parallel computing.  It discusses software and hardware, with an emphasis on standards, portability, and systems that are commercially or freely available.  Systems examined include clusters, the Grid, and tightly integrated supercomputers.

The tutorial surveys basic parallel computing concepts and terminology, and uses examples selected from large-scale engineering, scientific, and data intensive applications.  These real-world examples are targeted at distributed memory systems using MPI, Grid systems using Globus, shared memory systems using OpenMP, and hybrid systems that combine the MPI and OpenMP programming paradigms.  The tutorial shows basic parallelization approaches and discusses some of the software engineering aspects of the parallelization process, including the use of state-of-the-art tools.  The tools introduced range from parallel debugging tools to performance analysis and tuning packages.

The tutorial helps attendees make intelligent decisions by covering the primary options that are available, explaining how they are used and what they are most suitable for. Extensive pointers to the literature and web-based resources are provided to facilitate follow-up studies. </abs>
    <awards></awards>
    <intro_level>75</intro_level>
    <inter_level>25</inter_level>
    <adv_level>0</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:32:45</time_stamp>
    <status>active</status>
    <sub_id>tut155</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>40</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M5: A Practical Approach to Performance Analysis and Modeling of Large-Scale Systems</sess_title>
    <sess_chair>Darren J. Kerbyson (Los Alamos National Laboratory)</sess_chair>
    <title>A Practical Approach to Performance Analysis and Modeling of Large-Scale Systems</title>
    <all_auth_inst>Darren J. Kerbyson (Los Alamos National Laboratory), Adolfy Hoisie (Los Alamos National Laboratory)</all_auth_inst>
    <abs>This tutorial presents a practical approach to the performance modeling of large-scale, scientific applications on high performance systems. The defining characteristic of our tutorial involves the description of a proven modeling approach, developed at Los Alamos, of full-blown scientific codes, ranging from a few thousand to over 100,000 lines, that has been validated on systems containing 1,000’s of processors. The goal is to impart a detailed understanding of factors contributing to the resulting performance of an application when mapped onto a given HPC platform. Performance modeling is the only technique that can quantitatively elucidate this understanding. We show how models are constructed and demonstrate how they are used to predict, explain, diagnose, and engineer application performance in existing or future codes and/or systems. Notably, our approach does not require the use of specific tools but rather is applicable across commonly used environments. Moreover, since our performance models are parametric in terms of machine and application characteristics, they imbue the user with the ability to “experiment ahead” with different system configurations or algorithms/coding strategies. Both will be demonstrated in studies emphasizing the application of these modeling techniques including: verifying system performance, comparison of large-scale systems, and examination of possible future systems.</abs>
    <awards></awards>
    <intro_level>30</intro_level>
    <inter_level>50</inter_level>
    <adv_level>20</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:32:24</time_stamp>
    <status>active</status>
    <sub_id>tut157</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>38</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M4: Hot Chips and Hot Interconnects for High End Computing Systems</sess_title>
    <sess_chair>Subhash  Saini (NASA Ames Research Center)</sess_chair>
    <title>Hot Chips and Hot Interconnects for High End Computing Systems</title>
    <all_auth_inst>Subhash Saini (NASA Ames Research Center)</all_auth_inst>
    <abs>I will discuss several processors: i.  the Cray proprietary processor, which is used in the Cray X1; ii. the IBM Power 3 and Power 4, which  are  used in an IBM SP 3 and IBM SP 4  systems; iii. the Intel Itanium and Xeon, which are used in the SGI Altix systems and clusters respectively; iv. IBM System-on-a-Chip, which is used in IBM BlueGene/L; v. HP Alpha EV68 processor, which is used in DOE ASCI Q cluster;  vi. SPARC64 V processor, which is used in the Fujitsu PRIMEPOWER HPC2500; vii. an NEC proprietary processor, which is used in NEC SX-6/7; viii.  Power 4+ processor, which is used in Hitachi SR11000; xi. NEC proprietary processor, which is used in Earth Simulator.  The architectures of these processors will first be presented, followed by interconnection networks and a description of high-end computer systems based on these processors and networks. The performance of various hardware/programming model combinations will then be compared, based on latest NAS Parallel Benchmark results (MPI, OpenMP/HPF and hybrid (MPI + OpenMP). The tutorial will conclude with a discussion of general trends in the field of high performance computing, (quantum computing, DNA computing, cellular engineering, and neural networks). </abs>
    <awards></awards>
    <intro_level>25</intro_level>
    <inter_level>50</inter_level>
    <adv_level>25</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:31:40</time_stamp>
    <status>active</status>
    <sub_id>tut145</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>35</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M2: Application Supercomputing on Scalable Architectures</sess_title>
    <sess_chair>Alice  Koniges (LLNL)</sess_chair>
    <title>Application Supercomputing on Scalable Architectures</title>
    <all_auth_inst>Alice Koniges (Lawrence Livermore National Laboratory), Mark Seager (Lawrence Livermore National Laboratory), David Eder (Lawrence Livermore National Laboratory), Rolf Rabenseifner (High Performance Computing Center Stuttgart), Michael Resch (High Performance Computing Center Stuttgart)</all_auth_inst>
    <abs>Teraflop performance is no longer a thing of the future as complex integrated 3D simulations drive supercomputer development. Today, most HPC systems are clusters of SMP nodes ranging from dual-CPU-PC clusters to the largest systems at the world's major computing centers. What are the major issues facing application code developers today? How do the challenges vary from cluster computing to the complex hybrid architectures with super scalar and vector processors? What skills and tools are required, both of the application developer and the system itself? Finally, what are the paths both architecturally and algorithmically to petaflop performance? In this tutorial, we address these questions and give tips, tricks, and tools of the trade for large-scale application development. In the introduction, we provide an overview of terminology, hardware and performance. Advanced topics are mixed-mode (combined MPI/OpenMP) programming, vector tips, and cluster environments. We describe the latest issues in implementing scalable parallel programming. We draw from a series of large application suites and discuss specific challenges and problems encountered in parallelizing these applications. Finally we discuss upcoming architectures such as BlueGene/L and the latest vector systems. See also http://www.hlrs.de/people/rabenseifner/publ/SC2004-tutorial.html </abs>
    <awards></awards>
    <intro_level>25</intro_level>
    <inter_level>50</inter_level>
    <adv_level>25</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:30:28</time_stamp>
    <status>active</status>
    <sub_id>tut164</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>43</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M9: Virtual Data Management for Grid Computing</sess_title>
    <sess_chair>Michael Wilde (Argonne)</sess_chair>
    <title>Virtual Data Management for Grid Computing</title>
    <all_auth_inst>Michael Wilde (Argonne), Ewa Deelman (ISI)</all_auth_inst>
    <abs>Virtual data is a paradigm for expressing and managing the relationships between datasets and the computational procedures that produce them. It provides abstractions to describe data that does not yet been computed, and is embodied in a toolkit which automates workflow generation and data provenance tracking for problems ranging from desktop analysis to massive-scale computations on a Grid.

In a virtual data system, data, procedures, and computations are all first class entities, and can be published, discovered, and manipulated. Virtual data enables us to trace the provenance of derived data; plan and track the computational workflows required to derive a particular data product; determine whether a requested computation has been performed previously and whether it is cheaper to rerun it or to retrieve previously generated data; and discover computational procedures with desired characteristics.

This tutorial describes the foundations of the virtual data concept, presents a practical, “how-to-focused” introduction to the Grid-based virtual data tools created by GriPhyN, the Grid Physics Network project, explores related work in the fields of provenance tracking and workflow management, and presents case studies of virtual data on computing problems in high-energy physics, biology, medical research, astronomy and astrophysics. </abs>
    <awards></awards>
    <intro_level>15</intro_level>
    <inter_level>60</inter_level>
    <adv_level>25</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:29:49</time_stamp>
    <status>active</status>
    <sub_id>tut153</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>46</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M12: The Grid &quot;Ecosystem&quot; - Developing Your Grid Strategy</sess_title>
    <sess_chair>Lee Liming (Argonne National Laboratory)</sess_chair>
    <title>Beyond Globus: Lessons Learned from the Grid</title>
    <all_auth_inst>Lee Liming (Argonne National Laboratory)</all_auth_inst>
    <abs>The Globus Alliance aims to provide solutions to the most persistent and vexing problems that come up in Grid projects and applications. Our solutions to date are collected in the Globus Toolkit and these solutions are used in many Grid applications and systems.

While the Globus Toolkit makes it easier to conduct Grid-based projects, the challenges are still far from easy and the Globus Toolkit does not provide a “turnkey” solution. Success in a Grid project depends on a clear vision of the problem(s) to be solved, awareness of relevant tools (both within and beyond the Globus Toolkit), and a strategy for applying the technology.

This half-day tutorial provides answers to critical questions for Grid project planners and product developers, including:

What types of problems is the Grid intended to address? How far does the Globus Toolkit go toward solving these problems? What do you need besides the Globus Toolkit to have a useful solution to your problem?

The Globus Toolkit will be put into context, and examples and roadmaps for the most common uses of the Globus Toolkit will be provided.</abs>
    <awards></awards>
    <intro_level>60</intro_level>
    <inter_level>30</inter_level>
    <adv_level>10</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:27:34</time_stamp>
    <status>active</status>
    <sub_id>tut141</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>19</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S4: Reconfigurable Supercomputing</sess_title>
    <sess_chair>Tarek El-Ghazawi (George Washington University)</sess_chair>
    <title>Reconfigurable Supercomputing</title>
    <all_auth_inst>Tarek El-Ghazawi (The George Washington University), Duncan Buell (University of South Carolina), Maya Gokhale (Los Alamos National Lab), Kris Gaj (George Mason University)</all_auth_inst>
    <abs>The synergistic advances in high-performance computing and reconfigurable computing, based on field programmable gate arrays (FPGAs), form the basis for a new paradigm in supercomputing, namely reconfigurable supercomputing.  This can be achieved through hybrid systems of microprocessors and FPGA modules that can leverage the system level concepts from high-performance computing and extend them to accommodate reconfigurations.  Such systems inherently support both fine-grain and coarse-grain parallelism, and can dynamically tune their architecture to fit the applications.  Many researchers have recognized this and advances are proceeding at three system levels.  At the networked computing level, researchers have extended job management systems to recognize networked reconfigurable resources and exploit their power, in a grid computing fashion.  Progress has been also made in programming and managing computer clusters, with reconfigurable co-processors.  Finally, steps have been taken towards the development of massively parallel systems of conventional microprocessors and reconfigurable computing capabilities.  Programming such systems can be quite challenging as programming FPGA devices can essentially involve hardware design.  However, there have been very significant developments in compiler technologies and programming tools for some of these systems.  This tutorial will introduce the field of reconfigurable supercomputing and its advances in systems, programming, applications, and compiler technology.  </abs>
    <awards></awards>
    <intro_level>30</intro_level>
    <inter_level>40</inter_level>
    <adv_level>30</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:30:42</time_stamp>
    <status>active</status>
    <sub_id>tut110</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>45</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M11: Performance Scaling on Constellation Systems</sess_title>
    <sess_chair>Lorna Alice Smith (EPCC, The University of Edinburgh)</sess_chair>
    <title>Performance Scaling on Constellation Systems</title>
    <all_auth_inst>Lorna Alice Smith (EPCC, The University of Edinburgh), Mark Bull (EPCC, The University of Edinburgh)</all_auth_inst>
    <abs>Constellation systems, or clustered symmetric multiprocessing (SMP) systems, have gradually become more prominent in the HPC market, with many of the top supercomputing systems now being based on this type of architecture. For example, of the top three systems in the world, two are based on clustered SMP architectures (The Earth Simulator and ASCI Q, see www.top500.org). Hence as these systems have become more prominent, it has become essential for applications to scale effectively this type of architecture.

This tutorial will focus on the tools and techniques required to achieve optimal performance and scaling on a range of constellation systems. We will cover techniques for optimizing inter- and intra- node communication, such as overlapping communication, cluster aware message passing, mixed mode programming and processor mapping. Tools for profiling communication patterns will also be covered, as will a range of additional topics, such as effective IO and memory usage.

The aim is to equip participants with an in-depth knowledge of a range of performance optimization techniques for these systems, to provide participants with enough detail to utilise these techniques on their own constellation systems. </abs>
    <awards></awards>
    <intro_level>10</intro_level>
    <inter_level>50</inter_level>
    <adv_level>40</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:32:01</time_stamp>
    <status>active</status>
    <sub_id>tut115</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>36</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M3: Component Software for High-Performance Computing</sess_title>
    <sess_chair>David Edward Bernholdt (Oak Ridge National Laboratory)</sess_chair>
    <title>Component Software for High-Performance Computing: Using the Common Component Architecture</title>
    <all_auth_inst>David E Bernholdt (Oak Ridge National Laboratory), Robert C Armstrong (Sandia National Laboratories), Lori Freitag Diachin (Lawrence Livermore National Laboratory), Wael R Elwasif (Oak Ridge National Laboratory), Madhusudhan Govindaraju (Binghamton University, State University of New York), Ragib Hasan (University of Illinois at Urbana-Champaign), Daniel S Katz (Jet Propulsion Laboratory, California Institute of Technology), James A Kohl (Oak Ridge National Laboratory), Gary Kumfert (Lawrence Livermore National Laboratory), Lois Curfman McInnes (Argonne National Laboratory), Boyana Norris (Argonne National Laboratory), Craig E Rasmussen (Los Alamos National Laboratory), Jaideep Ray (Sandia National Laboratories), Sameer Shende (University of Oregon), Shujia Zhou (Northrop Grumman/TASC)</all_auth_inst>
    <abs>This full-day tutorial will introduce participants to the Common Component Architecture (CCA) at both conceptual and practical levels. Component-based approaches to software development increase software developer productivity by helping to manage the complexity of large-scale software applications and facilitating the reuse and interoperability of code. The CCA was designed specifically with the needs of high-performance scientific computing in mind. It takes a minimalist approach to support language-neutral component-based application development for both parallel and distributed computing without penalizing the underlying performance, and with a minimal cost to incorporate existing code into the component environment. The CCA environment is also well suited to the creation of domain-specific application frameworks, whereas traditional domain-specific frameworks lack the generality and extensibility of the component approach. We will cover the concepts of components and the CCA in particular, the tools provided by the CCA environment, the creation of CCA-compatible components, and their use in scientific applications.  We will use a combination of traditional presentation and hands-on experience (computer with network access and ssh client required; X11 client desirable) during the tutorial. Those interested in the CCA are also encouraged to attend tutorial S3 on the Babel language interoperability tool. </abs>
    <awards></awards>
    <intro_level>25</intro_level>
    <inter_level>50</inter_level>
    <adv_level>25</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:28:48</time_stamp>
    <status>active</status>
    <sub_id>tut114</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>24</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S9: Clustermatic: An Innovative Approach to Cluster Computing</sess_title>
    <sess_chair>Greg Watson (Los Alamos National Laboratory)</sess_chair>
    <title>Clustermatic: An Innovative Approach to Cluster Computing</title>
    <all_auth_inst>Gregory Watson (LANL), Ronald Minnich (LANL), Erik Hendriks (LANL), Matthew Sottile (LANL)</all_auth_inst>
    <abs>Clustermatic is an award winning innovative software architecture that redefines cluster computing at all levels: from the BIOS to the parallel environment. The Clustermatic design maximizes performance and availability by achieving significant improvements in system booting and application startup times, minimizing points of failure and vastly simplifying management and administration activities. It is suitable for use on a wide range of architectures, and has been successfully deployed on tiny clusters containing only 2 diskless nodes all that way up to a 1408 node (2816 processor), 11 Tflop cluster. Key components of Clustermatic include LinuxBIOS, BProc, BJS, LA-MPI and Linux. 

This tutorial aims to introduce participants to the Clustermatic architecture, while providing hands-on experience in installing, managing and using a real cluster. The tutorial will combine detailed technical information about the design and operation of Clustermatic software with practical examples of how to deploy Clustermatic on a typical cluster system. Our tutorial format is designed to maximize the hands-on time for participants by giving each attendee the ability to undertake the activities using a real cluster system. 

The Clustermatic system was awarded the Excellence in Cluster Technology Award for Open Source Cluster Solutions at the 2004 ClusterWorld Conference &amp; Expo.</abs>
    <awards></awards>
    <intro_level>10</intro_level>
    <inter_level>50</inter_level>
    <adv_level>40</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:27:17</time_stamp>
    <status>active</status>
    <sub_id>tut116</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>18</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S3: Bridging Programming Languages with Babel</sess_title>
    <sess_chair>Gary  Kumfert (Lawrence Livermore National Laboratory)</sess_chair>
    <title>Bridging Programming Languages with Babel, Parts I and II</title>
    <all_auth_inst>Gary Kumfert (Lawrence Livermore National Laboratory), Thomas G. W. Epperly (Lawrence Livermore National Laboratory), Tamara Dahlgren (Lawrence Livermore National Laboratory)</all_auth_inst>
    <abs>Babel exists to bridge communities; specifically the scientific C, C++, Fortran77, Fortran90, Java, and Python communities. We also connect library developers (looking for ways to maximize customers) with computational scientists (wanting the best software, without concern for what community it came from).

Part I is a half-day introduction and tutorial. Babel enables arbitrary mixing of all supported languages (see list above) at maximum performance.  This means languages are mixed in the call stack of a single executable: no messaging, no data copying, and no interpreted middleware.  Far from a LCD solution, Babel actually adds features like polymorphism, exception handling, and efficient multi-dimensional arrays to languages that don't support them natively.  Our Scientific Interface Definition Language (SIDL) defines the object model that Babel supports uniformly across languages.

Part II will be a hands-on activity covering the installation and use of Babel on attendees' UNIX-based environments.  Activities increase in sophistication from simply using example Babel objects in language of choice, through reimplementing objects in new languages, adding new capabilities to objects, to finally designing and implementing new Babel objects from scratch.  Multiple instructors will circulate through the audience for one-on-one help when needed.</abs>
    <awards></awards>
    <intro_level>30</intro_level>
    <inter_level>40</inter_level>
    <adv_level>30</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:28:01</time_stamp>
    <status>active</status>
    <sub_id>tut106</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>41</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S6: Open Source Tools for Computational Biology</sess_title>
    <sess_chair>Craig Stewart (Indiana University)</sess_chair>
    <title>Open Source Tools for Computational Biology</title>
    <all_auth_inst>Craig Stewart (indiana university), Richard Repasky (indiana university), Andrew Arenson (indiana university)</all_auth_inst>
    <abs>Open source software tools are a driving force in the revolution in life sciences. Computational biology, bioinformatics, genomics, systems biology and related areas should revolutionize our understanding of biological processes and our ability to treat medical problems. The purpose of this tutorial is to provide an introduction to several important open source software applications for the life sciences. Emphasis will be on those of most widespread applicability as well as open source tools for parallel and grid computing in the life sciences.

Topics to be covered in depth include: sequence alignment and pattern matching; protein structure prediction; phylogenetics; systems biology; grid computing applications; and thoughts about the future of computational biology. This tutorial is intended for people who are interested in a rapid and useful introduction to computational biology and high performance computing. Attendees can expect to gain significant exposure to the critical applications as a result of hands-on exercises. Hands-on exercises are planned to take approximately one hour of the day-long tutorial, We plan for one computer for every two participants. Participants will obtain a broad overview of open source tools for the life sciences and will walk away prepared to download applications, use them, and improve them. </abs>
    <awards></awards>
    <intro_level>30</intro_level>
    <inter_level>50</inter_level>
    <adv_level>20</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:31:25</time_stamp>
    <status>active</status>
    <sub_id>tut158</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>26</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M1: Taking Your MPI Application to the Next Level</sess_title>
    <sess_chair>Jeffrey Michael Squyres (Indiana University)</sess_chair>
    <title>Taking Your MPI Application to the Next Level: Threading, Dynamic Processes, and Multi-Network Utilization</title>
    <all_auth_inst>Jeffrey Michael Squyres (Indiana University), Richard L Graham (Los Alamos National Laboratory), Graham E. Fagg (University of Tennessee), George Bosilca (University of Tennessee)</all_auth_inst>
    <abs>Although the MPI-2 specification was finished in 1996, important features of the specification and run-time environments have not become mature in MPI implementations until recently. Including a balance of presentation and hands-on examples, this tutorial focuses on those areas: threading, dynamic processes, heterogeneous networking, and run-time tuning of MPI applications.

Truly multi-threaded MPI programs (beyond traditional OpenMP+MPI models), with multiple application threads simultaneously executing MPI functions, can be exploited for useful control and computational features.  Both MPI-2 dynamic process models -- spawning and connecting to already-running MPI processes -- can be used for practical applications such as dynamically reporting on the status of long-running parallel codes.  A relatively new feature, heterogeneous networking -- using multiple networks to communicate between processes -- is becoming increasingly relevant, not only as organizations find that they accumulate different types of networks in LAN environments, but also in Grid / WAN environments.  Finally, run-time tuning of the MPI implementation itself allows performance tweaking on both a cluster-wide and application-specific basis without changing any application code.  Emphasis will be placed on how the concepts discussed apply not only to the everyday MPI developer and user, but also to the cluster/network administrator.</abs>
    <awards></awards>
    <intro_level>0</intro_level>
    <inter_level>50</inter_level>
    <adv_level>50</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:28:15</time_stamp>
    <status>active</status>
    <sub_id>tut151</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>22</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S7: TeraGrid: Learn Once, Run Anywhere</sess_title>
    <sess_chair>Nancy Wilkins-Diehr </sess_chair>
    <title>TeraGrid: Learn Once, Run Anywhere</title>
    <all_auth_inst>Nancy Wilkins-Diehr (SDSC), John Towns (NCSA)</all_auth_inst>
    <abs>The TeraGrid is the foundation of the NSF’s national cyberinfrastructure program and is positioned to ignite the imaginations of new grid communities while delivering the next level of innovation in grid computing. It will connect scientific instruments, data collections and other unique resources as well as offer significant amounts of compute power. TeraGrid includes over 25 teraflops of computing power, 1 petabyte of data storage, high-resolution visualization environments, and grid services. The TeraGrid is anchored with Intel-based Linux clusters at ANL, Caltech, NCSA and the SDSC and an Alpha-based cluster at PSC that are connected by a 40 Gbps network. TeraGrid is in the process of expanding to include resources at Indiana University, Purdue University, ORNL and TACC (U Texas).

This tutorial includes an overview of the TeraGrid environment and configuration and descriptions of available services. The programming techniques learned in this tutorial will be applicable in many grid communities. Attendees can expect to learn to manage a grid identity and work through several usage scenarios by building and launching sample jobs. Several working applications will be used as examples to illustrate these capabilities. Attendees are expected to be familiar with Fortran or C programming, MPI and basic Unix environments.</abs>
    <awards></awards>
    <intro_level>5</intro_level>
    <inter_level>55</inter_level>
    <adv_level>40</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:28:29</time_stamp>
    <status>active</status>
    <sub_id>tut133</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>56</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S8:  State of InfiniBand in Designing HPC Clusters, Storage/File Systems, and Datacenters</sess_title>
    <sess_chair>Dhabaleswar K. Panda (The Ohio State University)</sess_chair>
    <title>State of InfiniBand in Designing  HPC Clusters, Storage/File Systems, and Datacenters</title>
    <all_auth_inst>Dhabaleswar K. (DK) Panda (The Ohio State University)</all_auth_inst>
    <abs>InfiniBand Architecture (IBA) is generating a lot of excitement toward building next generation HPC clusters, Servers, Storage/File systems, and Datacenters in a radical different manner. This is leading to the following questions among many scientists, engineers, managers, developers, and users of high-end systems: 1) What is IBA?  2) How is it different from other interfaces and interconnects (PCI-X, PCI-Express, 10.0 GigE, Myrinet, Quadrics, etc.)?  3) Available IBA hardware/software solutions and their trends, and 4) How one can take advantage of IBA features to design next generation high-end systems?

This tutorial will provide answers to the above questions in an in-depth manner.  We will start with the motivation behind IBA and a brief overview of its architectural aspects.  An in-depth comparison will be done with other interfaces and interconnects.  IBA hardware/software solutions and the market trends will be highlighted. Finally, case studies outlining the challenges and experiences in designing HPC clusters (MPI-1 and MPI-2), Distributed Shared Memory systems (TreadMarks, HLRC, ARMCI), File systems (DAFS, PVFS, Lustre), Storage protocols (SRP, iSCSI), Database systems (Oracle 9i RAC), and Multi-tier Datacenters with the novel features of IBA will be presented. These case studies will highlight the associated performance numbers/comparisons and open research issues.</abs>
    <awards></awards>
    <intro_level>20</intro_level>
    <inter_level>40</inter_level>
    <adv_level>40</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:26:56</time_stamp>
    <status>active</status>
    <sub_id>tut130</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>16</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S1:  Advanced MPI - A Hands-on Tutorial</sess_title>
    <sess_chair>Rob Ross (MCS/ANL)</sess_chair>
    <title>Advanced MPI: I/O and One-Sided Communication</title>
    <all_auth_inst>Robert Ross (ANL), Ewing Lusk (ANL), William Gropp (ANL), Rajeev Thakur (ANL)</all_auth_inst>
    <abs>This tutorial is about advanced use of MPI, in particular the MPI-IO and one-sided communication features that were added to MPI (Message-Passing Interface) by the second MPI Forum called MPI-2.  Implementations of MPI-2 (or significant subsets thereof) are now available both from vendors and from open-source projects. For example, the one-sided communication functions of MPI-2 are being used successfully in applications running on the Earth Simulator. In other words, MPI-2 can now really be used in practice.

During the day we will cover two major components of MPI-2: parallel I/O and one-sided communication.  The tutorial will be heavily example driven, using one projector for tutorial material and a second projector to display code and demonstration runs of examples.  Attendees will have the opportunity to see codes using these advanced concepts built, run, and modified during the tutorial.

Attendees will leave the tutorial with both an understanding of these advanced concepts and a collection of working example codes that they are familiar with and have seen run and modified.  This will prepare them for applying these concepts in their own applications. </abs>
    <awards></awards>
    <intro_level>10</intro_level>
    <inter_level>50</inter_level>
    <adv_level>40</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:29:29</time_stamp>
    <status>active</status>
    <sub_id>tut120</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>44</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M10: Methods for Performance Engineering of Scientific Applications</sess_title>
    <sess_chair>Allan Edward Snavely (San Diego Supercomputer Center)</sess_chair>
    <title>Methods for Performance Engineering of Scientific Applications</title>
    <all_auth_inst>Allan Edward Snavely (SDSC), Celso Mendes (UIUC), Bronis de Supinski (LLNL), Paul Hovland (Argonne), David H. (Bailey)</all_auth_inst>
    <abs>This tutorial presents methodology to improve the performance of scientific applications. Attendees will learn to model and understand performance to reveal opportunity for improving application performance by tuning and/or choosing the target machine. The process consists of assessing computational demands, capability and complexity of the application code, as well as understanding the efficiency of the mapping of the application to architectures. We walk through a series of common performance-related issues such as instruction mix, memory and file system access patterns, nature and type of communications, and data and work distribution, for which performance models can guide optimizations. Tools from PERC, the Department of Energy Office of Science’s Performance Evaluation Research Center, are used to explore these issues and guide performance improvements. We draw exercises from real-life scientific applications in use by the community and to exemplify these issues and to demonstrate optimization guidance. We target an audience primarily of application developers who need to quantify and improve the performance of their codes. Our tools are also of interest to system designers, administrators and integrators looking to monitor and maximize time-to-solution. Attendees should be familiar with at least one scientific application, parallel programming environment and HPC platform.</abs>
    <awards></awards>
    <intro_level>25</intro_level>
    <inter_level>50</inter_level>
    <adv_level>25</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:33:09</time_stamp>
    <status>active</status>
    <sub_id>tut140</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>23</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M7: Cybersecurity at Open Scientific Facilities</sess_title>
    <sess_chair>Vern Paxson (ICSI / LBNL)</sess_chair>
    <title>Cybersecurity at Open Scientific Facilities</title>
    <all_auth_inst>Vern Paxson (ICSI / LBNL), James Rothfuss (LBNL), Stephen Lau (LBNL), William Kramer (LBNL)</all_auth_inst>
    <abs>The ability for scientists to collaborate unfettered over networks is critical for high performance computational (HPC) environments. This need however is tempered by the realities of today's interconnected computational environments, where protection from unauthorized access and usage is a necessity. How does one find an effective balance between the needs of an open scientific research facility and simultaneously protecting the site from attacks? What challenges lie ahead in high performance security?

This tutorial addresses these questions by exploring various topics of computer security as they relate to open, high-performance computer facilities. Some of the topics we will address are:

1) The unique nature and demands within an HPC environment 2) Addressing the needs of computer protection in an HPC environment 3) An overview of current trends in attacks and incidents 4) Intrusion detection in an HPC environment 5) The future of high performance computing protection

SciNet – SC 2004’s network -- itself resembles networks at open scientific facilities. Some of the tools deployed at open scientific facilities are also deployed at SC for computer protection. We will show real network attack statistics collected at SC04 and explicate how the techniques described in the tutorial are in use at SC.</abs>
    <awards></awards>
    <intro_level>10</intro_level>
    <inter_level>50</inter_level>
    <adv_level>40</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:31:01</time_stamp>
    <status>active</status>
    <sub_id>tut122</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>47</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M13: Constructing Advanced Storage Networks</sess_title>
    <sess_chair>Joseph Pelissier (McDATA Corporation)</sess_chair>
    <title>Constructing Advanced Storage Networks: Integrating Virtual Fabrics, SAN Routing, and IP Based Storage</title>
    <all_auth_inst>Joseph Edward Pelissier (McDATA Corporation)</all_auth_inst>
    <abs>Storage area networks are continuing to experience phenomenal growth.  To accommodate this growth, a variety of new technologies are being introduced to address the various manageability, scalability, and cost requirements of future SAN infrastructures.  Among these technologies are partitionable and multiprotocol switches, virtual fabrics and networks, IP storage technologies, and a large variety of gateway capabilities.  This tutorial will provide participants with a working knowledge of these technologies, their capabilities and limitations, sufficient to begin to plan and design future advanced storage networks.</abs>
    <awards></awards>
    <intro_level>20</intro_level>
    <inter_level>60</inter_level>
    <adv_level>20</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:32:57</time_stamp>
    <status>active</status>
    <sub_id>tut134</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>21</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M6: High Performance Data Transfer</sess_title>
    <sess_chair>Phillip Dykstra (WareOnEarth Communications)</sess_chair>
    <title>High Performance Data Transfer</title>
    <all_auth_inst>Phillip Dykstra (WareOnEarth Communications)</all_auth_inst>
    <abs>High Performance Data Transfer is a core requirement of many Supercomputing applications.  From basic FTP file transfers to P2P or Grid applications, moving data across LANs and WANs and high speed is critically important.  This tutorial will cover a wide range of approaches used to achieve this, many of which are complimentary, including tuning end systems and networks, new or improved protocols, parallel transfers, and abstract data storage.

This tutorial covers the whys and hows of high performance data transfer.  The first half focuses on advanced networking technology and low-level performance issues such as delay, loss, switching/routing, and TCP and UDP dynamics. The second half looks at higher level approaches to improving performance, from improved protocols, parallel transfers, peer-to-peer and grid techniques, and abstract storage services.

The attendee should come away with a detailed understanding of data transfer over wide area networks, and exposure to a great number of tools and utilities to tune, debug, and improve their ability to move data at high speed. </abs>
    <awards></awards>
    <intro_level>25</intro_level>
    <inter_level>50</inter_level>
    <adv_level>25</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:27:48</time_stamp>
    <status>active</status>
    <sub_id>tut139</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>20</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S5: Practical Application Performance Analysis on Linux Systems</sess_title>
    <sess_chair>John Mellor-Crummey (Rice University)</sess_chair>
    <title>Practical Application Performance Analysis on Linux Systems</title>
    <all_auth_inst>John M. Mellor-Crummey (Rice University), Robert J. Fowler (Rice University), Nathan Tallent (Rice University)</all_auth_inst>
    <abs>This tutorial will teach attendees how to analyze the node performance of both serial and parallel applications on computer systems running the Linux operating system. The tutorial is intended for a broad audience including both computer and computational scientists who are interested in tuning the performance of middleware and applications. The morning session will begin by reviewing aspects of computer system organization relevant to application performance and describing architectural support for monitoring machine performance that is available in today’s computer systems. Most of the morning session will focus on how to use HPCToolkit – a collection of multi-platform performance tools developed at Rice University – to analyze application performance using data collected from hardware performance counters. HPCToolkit enables users to collect a variety of performance metrics for unmodified application binaries, correlate measurements with applications at multiple levels, synthesize performance databases containing multiple metrics, and browse the performance databases in a top-down fashion to pinpoint performance bottlenecks.  In the afternoon, attendees will use their laptops to access a remote system to run HPCToolkit on provided program samples or their own codes. The tutorial will conclude with a presentation of how attendees can download and build HPCToolkit on their own computer systems.</abs>
    <awards></awards>
    <intro_level>20</intro_level>
    <inter_level>60</inter_level>
    <adv_level>20</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:30:08</time_stamp>
    <status>active</status>
    <sub_id>tut138</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>42</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M8: UPC: Unified Parallel C</sess_title>
    <sess_chair>Tarek El-Ghazawi (George Washington University)</sess_chair>
    <title>UPC: Unified Parallel C</title>
    <all_auth_inst>William Carlson (IDA Center for Computing Sciences), Tarek El-Ghazawi (High Performance Computing Laboratory - ECE Dept - The George Washington University)</all_auth_inst>
    <abs>UPC, or Unified Parallel C, is a parallel extension of ISO C.  UPC follows the Partitioned Global Address Space (PGAS) programming model, which is aimed at leveraging the ease of programming of the shared memory paradigm, while enabling the exploitation of data locality.  To this end, UPC incorporates constructs that allow placing data near the threads that manipulate them to minimize remote accesses.  UPC has also many advanced synchronization features including mechanisms for overlapping synchronization with local processing and constructs for defining memory consistency. UPC is the effort of a consortium of universities, government and industry.   It has been receiving rising attention from programmers and vendors and is now a product available on the Cray X1 and HP parallel computers.  Open Source compilers are available for most other platforms.  The TotalView (tm) debugger is also available.</abs>
    <awards></awards>
    <intro_level>10</intro_level>
    <inter_level>40</inter_level>
    <adv_level>50</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:24:55</time_stamp>
    <status>active</status>
    <sub_id>tut128</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>25</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S10: High Performance Computing in Python</sess_title>
    <sess_chair>Åsmund Ødegård (it manager)</sess_chair>
    <title>High Performance Computing in Python</title>
    <all_auth_inst>Hans Petter Langtangen (Professor), Are Magnus Bruaset (Simula Research Laboratory), Kent-Andre Mardal (Simula Research Laboratory), Patrick Miller (Lawrence Livermore National Laboratory), Halvard Moe (Simula Research Laboratory), Ola Skavhaug (Simula Research Laboratory), Åsmund Ødegård (Simula Research Laboratory)</all_auth_inst>
    <abs>The virtue of Python is flexibility, making it an ideal tool for the scientist who wants to experiment with different models, do rapid prototyping, and even do large scale simulation. This tutorial presents concepts and software that will give the scientist adaptable and powerful tools.

The tutorial begins with the basic concepts of using Python for scientific and high performance computing.  We cover the use of the Python Numeric module, threads, and pyMPI for SPMD style distributed parallelism.  We also discuss performance optimization of Python code, and Python tools for visualization. The latter includes how to make simple movies as well as interfaces to complex visualization programs. The first part ends with an in-depth introduction to C and FORTRAN code wrapping, allowing applications originally written in these languages to be run from Python. We also cover how to increase computational performance by extending Python with C and FORTRAN modules.

The last part is a hands-on session. We equip a FORTRAN application with a Python interface, replace the main loop with a Python program, and visualize results. We also cover more details on C and FORTRAN extensions. 

The audience work on their own laptops. Software/info both at http://www.simula.no/projects/sc2004 and on CD.</abs>
    <awards></awards>
    <intro_level>10</intro_level>
    <inter_level>50</inter_level>
    <adv_level>40</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:27:06</time_stamp>
    <status>active</status>
    <sub_id>tut129</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>17</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S2: Parallel Computing 101</sess_title>
    <sess_chair>Quentin Fielden Stout (University of Michigan)</sess_chair>
    <title>Parallel Computing 101</title>
    <all_auth_inst>Quentin F. Stout (Unversity of Michigan), Christiane Jablonowski (NCAR)</all_auth_inst>
    <abs>This tutorial provides a comprehensive overview of parallel computing, emphasizing those aspects most relevant to the user. It is suitable for new or prospective users, managers, students and anyone seeking a general overview of parallel computing.  It discusses software and hardware, with an emphasis on standards, portability, and systems that are commercially or freely available.  Systems examined include clusters, the Grid, and tightly integrated supercomputers.

The tutorial surveys basic parallel computing concepts and terminology, and uses examples selected from large-scale engineering, scientific, and data intensive applications.  These real-world examples are targeted at distributed memory systems using MPI, Grid systems using Globus, shared memory systems using OpenMP, and hybrid systems that combine the MPI and OpenMP programming paradigms.  The tutorial shows basic parallelization approaches and discusses some of the software engineering aspects of the parallelization process, including the use of state-of-the-art tools.  The tools introduced range from parallel debugging tools to performance analysis and tuning packages.

The tutorial helps attendees make intelligent decisions by covering the primary options that are available, explaining how they are used and what they are most suitable for. Extensive pointers to the literature and web-based resources are provided to facilitate follow-up studies. </abs>
    <awards></awards>
    <intro_level>75</intro_level>
    <inter_level>25</inter_level>
    <adv_level>0</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:32:45</time_stamp>
    <status>active</status>
    <sub_id>tut155</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>40</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M5: A Practical Approach to Performance Analysis and Modeling of Large-Scale Systems</sess_title>
    <sess_chair>Darren J. Kerbyson (Los Alamos National Laboratory)</sess_chair>
    <title>A Practical Approach to Performance Analysis and Modeling of Large-Scale Systems</title>
    <all_auth_inst>Darren J. Kerbyson (Los Alamos National Laboratory), Adolfy Hoisie (Los Alamos National Laboratory)</all_auth_inst>
    <abs>This tutorial presents a practical approach to the performance modeling of large-scale, scientific applications on high performance systems. The defining characteristic of our tutorial involves the description of a proven modeling approach, developed at Los Alamos, of full-blown scientific codes, ranging from a few thousand to over 100,000 lines, that has been validated on systems containing 1,000’s of processors. The goal is to impart a detailed understanding of factors contributing to the resulting performance of an application when mapped onto a given HPC platform. Performance modeling is the only technique that can quantitatively elucidate this understanding. We show how models are constructed and demonstrate how they are used to predict, explain, diagnose, and engineer application performance in existing or future codes and/or systems. Notably, our approach does not require the use of specific tools but rather is applicable across commonly used environments. Moreover, since our performance models are parametric in terms of machine and application characteristics, they imbue the user with the ability to “experiment ahead” with different system configurations or algorithms/coding strategies. Both will be demonstrated in studies emphasizing the application of these modeling techniques including: verifying system performance, comparison of large-scale systems, and examination of possible future systems.</abs>
    <awards></awards>
    <intro_level>30</intro_level>
    <inter_level>50</inter_level>
    <adv_level>20</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:32:24</time_stamp>
    <status>active</status>
    <sub_id>tut157</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>38</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M4: Hot Chips and Hot Interconnects for High End Computing Systems</sess_title>
    <sess_chair>Subhash  Saini (NASA Ames Research Center)</sess_chair>
    <title>Hot Chips and Hot Interconnects for High End Computing Systems</title>
    <all_auth_inst>Subhash Saini (NASA Ames Research Center)</all_auth_inst>
    <abs>I will discuss several processors: i.  the Cray proprietary processor, which is used in the Cray X1; ii. the IBM Power 3 and Power 4, which  are  used in an IBM SP 3 and IBM SP 4  systems; iii. the Intel Itanium and Xeon, which are used in the SGI Altix systems and clusters respectively; iv. IBM System-on-a-Chip, which is used in IBM BlueGene/L; v. HP Alpha EV68 processor, which is used in DOE ASCI Q cluster;  vi. SPARC64 V processor, which is used in the Fujitsu PRIMEPOWER HPC2500; vii. an NEC proprietary processor, which is used in NEC SX-6/7; viii.  Power 4+ processor, which is used in Hitachi SR11000; xi. NEC proprietary processor, which is used in Earth Simulator.  The architectures of these processors will first be presented, followed by interconnection networks and a description of high-end computer systems based on these processors and networks. The performance of various hardware/programming model combinations will then be compared, based on latest NAS Parallel Benchmark results (MPI, OpenMP/HPF and hybrid (MPI + OpenMP). The tutorial will conclude with a discussion of general trends in the field of high performance computing, (quantum computing, DNA computing, cellular engineering, and neural networks). </abs>
    <awards></awards>
    <intro_level>25</intro_level>
    <inter_level>50</inter_level>
    <adv_level>25</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:31:40</time_stamp>
    <status>active</status>
    <sub_id>tut145</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>35</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M2: Application Supercomputing on Scalable Architectures</sess_title>
    <sess_chair>Alice  Koniges (LLNL)</sess_chair>
    <title>Application Supercomputing on Scalable Architectures</title>
    <all_auth_inst>Alice Koniges (Lawrence Livermore National Laboratory), Mark Seager (Lawrence Livermore National Laboratory), David Eder (Lawrence Livermore National Laboratory), Rolf Rabenseifner (High Performance Computing Center Stuttgart), Michael Resch (High Performance Computing Center Stuttgart)</all_auth_inst>
    <abs>Teraflop performance is no longer a thing of the future as complex integrated 3D simulations drive supercomputer development. Today, most HPC systems are clusters of SMP nodes ranging from dual-CPU-PC clusters to the largest systems at the world's major computing centers. What are the major issues facing application code developers today? How do the challenges vary from cluster computing to the complex hybrid architectures with super scalar and vector processors? What skills and tools are required, both of the application developer and the system itself? Finally, what are the paths both architecturally and algorithmically to petaflop performance? In this tutorial, we address these questions and give tips, tricks, and tools of the trade for large-scale application development. In the introduction, we provide an overview of terminology, hardware and performance. Advanced topics are mixed-mode (combined MPI/OpenMP) programming, vector tips, and cluster environments. We describe the latest issues in implementing scalable parallel programming. We draw from a series of large application suites and discuss specific challenges and problems encountered in parallelizing these applications. Finally we discuss upcoming architectures such as BlueGene/L and the latest vector systems. See also http://www.hlrs.de/people/rabenseifner/publ/SC2004-tutorial.html </abs>
    <awards></awards>
    <intro_level>25</intro_level>
    <inter_level>50</inter_level>
    <adv_level>25</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:30:28</time_stamp>
    <status>active</status>
    <sub_id>tut164</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>43</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M9: Virtual Data Management for Grid Computing</sess_title>
    <sess_chair>Michael Wilde (Argonne)</sess_chair>
    <title>Virtual Data Management for Grid Computing</title>
    <all_auth_inst>Michael Wilde (Argonne), Ewa Deelman (ISI)</all_auth_inst>
    <abs>Virtual data is a paradigm for expressing and managing the relationships between datasets and the computational procedures that produce them. It provides abstractions to describe data that does not yet been computed, and is embodied in a toolkit which automates workflow generation and data provenance tracking for problems ranging from desktop analysis to massive-scale computations on a Grid.

In a virtual data system, data, procedures, and computations are all first class entities, and can be published, discovered, and manipulated. Virtual data enables us to trace the provenance of derived data; plan and track the computational workflows required to derive a particular data product; determine whether a requested computation has been performed previously and whether it is cheaper to rerun it or to retrieve previously generated data; and discover computational procedures with desired characteristics.

This tutorial describes the foundations of the virtual data concept, presents a practical, “how-to-focused” introduction to the Grid-based virtual data tools created by GriPhyN, the Grid Physics Network project, explores related work in the fields of provenance tracking and workflow management, and presents case studies of virtual data on computing problems in high-energy physics, biology, medical research, astronomy and astrophysics. </abs>
    <awards></awards>
    <intro_level>15</intro_level>
    <inter_level>60</inter_level>
    <adv_level>25</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:29:49</time_stamp>
    <status>active</status>
    <sub_id>tut153</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>46</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M12: The Grid &quot;Ecosystem&quot; - Developing Your Grid Strategy</sess_title>
    <sess_chair>Lee Liming (Argonne National Laboratory)</sess_chair>
    <title>Beyond Globus: Lessons Learned from the Grid</title>
    <all_auth_inst>Lee Liming (Argonne National Laboratory)</all_auth_inst>
    <abs>The Globus Alliance aims to provide solutions to the most persistent and vexing problems that come up in Grid projects and applications. Our solutions to date are collected in the Globus Toolkit and these solutions are used in many Grid applications and systems.

While the Globus Toolkit makes it easier to conduct Grid-based projects, the challenges are still far from easy and the Globus Toolkit does not provide a “turnkey” solution. Success in a Grid project depends on a clear vision of the problem(s) to be solved, awareness of relevant tools (both within and beyond the Globus Toolkit), and a strategy for applying the technology.

This half-day tutorial provides answers to critical questions for Grid project planners and product developers, including:

What types of problems is the Grid intended to address? How far does the Globus Toolkit go toward solving these problems? What do you need besides the Globus Toolkit to have a useful solution to your problem?

The Globus Toolkit will be put into context, and examples and roadmaps for the most common uses of the Globus Toolkit will be provided.</abs>
    <awards></awards>
    <intro_level>60</intro_level>
    <inter_level>30</inter_level>
    <adv_level>10</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:27:34</time_stamp>
    <status>active</status>
    <sub_id>tut141</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>19</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S4: Reconfigurable Supercomputing</sess_title>
    <sess_chair>Tarek El-Ghazawi (George Washington University)</sess_chair>
    <title>Reconfigurable Supercomputing</title>
    <all_auth_inst>Tarek El-Ghazawi (The George Washington University), Duncan Buell (University of South Carolina), Maya Gokhale (Los Alamos National Lab), Kris Gaj (George Mason University)</all_auth_inst>
    <abs>The synergistic advances in high-performance computing and reconfigurable computing, based on field programmable gate arrays (FPGAs), form the basis for a new paradigm in supercomputing, namely reconfigurable supercomputing.  This can be achieved through hybrid systems of microprocessors and FPGA modules that can leverage the system level concepts from high-performance computing and extend them to accommodate reconfigurations.  Such systems inherently support both fine-grain and coarse-grain parallelism, and can dynamically tune their architecture to fit the applications.  Many researchers have recognized this and advances are proceeding at three system levels.  At the networked computing level, researchers have extended job management systems to recognize networked reconfigurable resources and exploit their power, in a grid computing fashion.  Progress has been also made in programming and managing computer clusters, with reconfigurable co-processors.  Finally, steps have been taken towards the development of massively parallel systems of conventional microprocessors and reconfigurable computing capabilities.  Programming such systems can be quite challenging as programming FPGA devices can essentially involve hardware design.  However, there have been very significant developments in compiler technologies and programming tools for some of these systems.  This tutorial will introduce the field of reconfigurable supercomputing and its advances in systems, programming, applications, and compiler technology.  </abs>
    <awards></awards>
    <intro_level>30</intro_level>
    <inter_level>40</inter_level>
    <adv_level>30</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:30:42</time_stamp>
    <status>active</status>
    <sub_id>tut110</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>45</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M11: Performance Scaling on Constellation Systems</sess_title>
    <sess_chair>Lorna Alice Smith (EPCC, The University of Edinburgh)</sess_chair>
    <title>Performance Scaling on Constellation Systems</title>
    <all_auth_inst>Lorna Alice Smith (EPCC, The University of Edinburgh), Mark Bull (EPCC, The University of Edinburgh)</all_auth_inst>
    <abs>Constellation systems, or clustered symmetric multiprocessing (SMP) systems, have gradually become more prominent in the HPC market, with many of the top supercomputing systems now being based on this type of architecture. For example, of the top three systems in the world, two are based on clustered SMP architectures (The Earth Simulator and ASCI Q, see www.top500.org). Hence as these systems have become more prominent, it has become essential for applications to scale effectively this type of architecture.

This tutorial will focus on the tools and techniques required to achieve optimal performance and scaling on a range of constellation systems. We will cover techniques for optimizing inter- and intra- node communication, such as overlapping communication, cluster aware message passing, mixed mode programming and processor mapping. Tools for profiling communication patterns will also be covered, as will a range of additional topics, such as effective IO and memory usage.

The aim is to equip participants with an in-depth knowledge of a range of performance optimization techniques for these systems, to provide participants with enough detail to utilise these techniques on their own constellation systems. </abs>
    <awards></awards>
    <intro_level>10</intro_level>
    <inter_level>50</inter_level>
    <adv_level>40</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:32:01</time_stamp>
    <status>active</status>
    <sub_id>tut115</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>36</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M3: Component Software for High-Performance Computing</sess_title>
    <sess_chair>David Edward Bernholdt (Oak Ridge National Laboratory)</sess_chair>
    <title>Component Software for High-Performance Computing: Using the Common Component Architecture</title>
    <all_auth_inst>David E Bernholdt (Oak Ridge National Laboratory), Robert C Armstrong (Sandia National Laboratories), Lori Freitag Diachin (Lawrence Livermore National Laboratory), Wael R Elwasif (Oak Ridge National Laboratory), Madhusudhan Govindaraju (Binghamton University, State University of New York), Ragib Hasan (University of Illinois at Urbana-Champaign), Daniel S Katz (Jet Propulsion Laboratory, California Institute of Technology), James A Kohl (Oak Ridge National Laboratory), Gary Kumfert (Lawrence Livermore National Laboratory), Lois Curfman McInnes (Argonne National Laboratory), Boyana Norris (Argonne National Laboratory), Craig E Rasmussen (Los Alamos National Laboratory), Jaideep Ray (Sandia National Laboratories), Sameer Shende (University of Oregon), Shujia Zhou (Northrop Grumman/TASC)</all_auth_inst>
    <abs>This full-day tutorial will introduce participants to the Common Component Architecture (CCA) at both conceptual and practical levels. Component-based approaches to software development increase software developer productivity by helping to manage the complexity of large-scale software applications and facilitating the reuse and interoperability of code. The CCA was designed specifically with the needs of high-performance scientific computing in mind. It takes a minimalist approach to support language-neutral component-based application development for both parallel and distributed computing without penalizing the underlying performance, and with a minimal cost to incorporate existing code into the component environment. The CCA environment is also well suited to the creation of domain-specific application frameworks, whereas traditional domain-specific frameworks lack the generality and extensibility of the component approach. We will cover the concepts of components and the CCA in particular, the tools provided by the CCA environment, the creation of CCA-compatible components, and their use in scientific applications.  We will use a combination of traditional presentation and hands-on experience (computer with network access and ssh client required; X11 client desirable) during the tutorial. Those interested in the CCA are also encouraged to attend tutorial S3 on the Babel language interoperability tool. </abs>
    <awards></awards>
    <intro_level>25</intro_level>
    <inter_level>50</inter_level>
    <adv_level>25</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:28:48</time_stamp>
    <status>active</status>
    <sub_id>tut114</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>24</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S9: Clustermatic: An Innovative Approach to Cluster Computing</sess_title>
    <sess_chair>Greg Watson (Los Alamos National Laboratory)</sess_chair>
    <title>Clustermatic: An Innovative Approach to Cluster Computing</title>
    <all_auth_inst>Gregory Watson (LANL), Ronald Minnich (LANL), Erik Hendriks (LANL), Matthew Sottile (LANL)</all_auth_inst>
    <abs>Clustermatic is an award winning innovative software architecture that redefines cluster computing at all levels: from the BIOS to the parallel environment. The Clustermatic design maximizes performance and availability by achieving significant improvements in system booting and application startup times, minimizing points of failure and vastly simplifying management and administration activities. It is suitable for use on a wide range of architectures, and has been successfully deployed on tiny clusters containing only 2 diskless nodes all that way up to a 1408 node (2816 processor), 11 Tflop cluster. Key components of Clustermatic include LinuxBIOS, BProc, BJS, LA-MPI and Linux. 

This tutorial aims to introduce participants to the Clustermatic architecture, while providing hands-on experience in installing, managing and using a real cluster. The tutorial will combine detailed technical information about the design and operation of Clustermatic software with practical examples of how to deploy Clustermatic on a typical cluster system. Our tutorial format is designed to maximize the hands-on time for participants by giving each attendee the ability to undertake the activities using a real cluster system. 

The Clustermatic system was awarded the Excellence in Cluster Technology Award for Open Source Cluster Solutions at the 2004 ClusterWorld Conference &amp; Expo.</abs>
    <awards></awards>
    <intro_level>10</intro_level>
    <inter_level>50</inter_level>
    <adv_level>40</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:27:17</time_stamp>
    <status>active</status>
    <sub_id>tut116</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>18</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S3: Bridging Programming Languages with Babel</sess_title>
    <sess_chair>Gary  Kumfert (Lawrence Livermore National Laboratory)</sess_chair>
    <title>Bridging Programming Languages with Babel, Parts I and II</title>
    <all_auth_inst>Gary Kumfert (Lawrence Livermore National Laboratory), Thomas G. W. Epperly (Lawrence Livermore National Laboratory), Tamara Dahlgren (Lawrence Livermore National Laboratory)</all_auth_inst>
    <abs>Babel exists to bridge communities; specifically the scientific C, C++, Fortran77, Fortran90, Java, and Python communities. We also connect library developers (looking for ways to maximize customers) with computational scientists (wanting the best software, without concern for what community it came from).

Part I is a half-day introduction and tutorial. Babel enables arbitrary mixing of all supported languages (see list above) at maximum performance.  This means languages are mixed in the call stack of a single executable: no messaging, no data copying, and no interpreted middleware.  Far from a LCD solution, Babel actually adds features like polymorphism, exception handling, and efficient multi-dimensional arrays to languages that don't support them natively.  Our Scientific Interface Definition Language (SIDL) defines the object model that Babel supports uniformly across languages.

Part II will be a hands-on activity covering the installation and use of Babel on attendees' UNIX-based environments.  Activities increase in sophistication from simply using example Babel objects in language of choice, through reimplementing objects in new languages, adding new capabilities to objects, to finally designing and implementing new Babel objects from scratch.  Multiple instructors will circulate through the audience for one-on-one help when needed.</abs>
    <awards></awards>
    <intro_level>30</intro_level>
    <inter_level>40</inter_level>
    <adv_level>30</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:28:01</time_stamp>
    <status>active</status>
    <sub_id>tut106</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>41</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S6: Open Source Tools for Computational Biology</sess_title>
    <sess_chair>Craig Stewart (Indiana University)</sess_chair>
    <title>Open Source Tools for Computational Biology</title>
    <all_auth_inst>Craig Stewart (indiana university), Richard Repasky (indiana university), Andrew Arenson (indiana university)</all_auth_inst>
    <abs>Open source software tools are a driving force in the revolution in life sciences. Computational biology, bioinformatics, genomics, systems biology and related areas should revolutionize our understanding of biological processes and our ability to treat medical problems. The purpose of this tutorial is to provide an introduction to several important open source software applications for the life sciences. Emphasis will be on those of most widespread applicability as well as open source tools for parallel and grid computing in the life sciences.

Topics to be covered in depth include: sequence alignment and pattern matching; protein structure prediction; phylogenetics; systems biology; grid computing applications; and thoughts about the future of computational biology. This tutorial is intended for people who are interested in a rapid and useful introduction to computational biology and high performance computing. Attendees can expect to gain significant exposure to the critical applications as a result of hands-on exercises. Hands-on exercises are planned to take approximately one hour of the day-long tutorial, We plan for one computer for every two participants. Participants will obtain a broad overview of open source tools for the life sciences and will walk away prepared to download applications, use them, and improve them. </abs>
    <awards></awards>
    <intro_level>30</intro_level>
    <inter_level>50</inter_level>
    <adv_level>20</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:31:25</time_stamp>
    <status>active</status>
    <sub_id>tut158</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>26</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>M1: Taking Your MPI Application to the Next Level</sess_title>
    <sess_chair>Jeffrey Michael Squyres (Indiana University)</sess_chair>
    <title>Taking Your MPI Application to the Next Level: Threading, Dynamic Processes, and Multi-Network Utilization</title>
    <all_auth_inst>Jeffrey Michael Squyres (Indiana University), Richard L Graham (Los Alamos National Laboratory), Graham E. Fagg (University of Tennessee), George Bosilca (University of Tennessee)</all_auth_inst>
    <abs>Although the MPI-2 specification was finished in 1996, important features of the specification and run-time environments have not become mature in MPI implementations until recently. Including a balance of presentation and hands-on examples, this tutorial focuses on those areas: threading, dynamic processes, heterogeneous networking, and run-time tuning of MPI applications.

Truly multi-threaded MPI programs (beyond traditional OpenMP+MPI models), with multiple application threads simultaneously executing MPI functions, can be exploited for useful control and computational features.  Both MPI-2 dynamic process models -- spawning and connecting to already-running MPI processes -- can be used for practical applications such as dynamically reporting on the status of long-running parallel codes.  A relatively new feature, heterogeneous networking -- using multiple networks to communicate between processes -- is becoming increasingly relevant, not only as organizations find that they accumulate different types of networks in LAN environments, but also in Grid / WAN environments.  Finally, run-time tuning of the MPI implementation itself allows performance tweaking on both a cluster-wide and application-specific basis without changing any application code.  Emphasis will be placed on how the concepts discussed apply not only to the everyday MPI developer and user, but also to the cluster/network administrator.</abs>
    <awards></awards>
    <intro_level>0</intro_level>
    <inter_level>50</inter_level>
    <adv_level>50</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_08:28:15</time_stamp>
    <status>active</status>
    <sub_id>tut151</sub_id>
    <event_type>Tutorial</event_type>
    <sess_id>22</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>S7: TeraGrid: Learn Once, Run Anywhere</sess_title>
    <sess_chair>Nancy Wilkins-Diehr </sess_chair>
    <title>TeraGrid: Learn Once, Run Anywhere</title>
    <all_auth_inst>Nancy Wilkins-Diehr (SDSC), John Towns (NCSA)</all_auth_inst>
    <abs>The TeraGrid is the foundation of the NSF’s national cyberinfrastructure program and is positioned to ignite the imaginations of new grid communities while delivering the next level of innovation in grid computing. It will connect scientific instruments, data collections and other unique resources as well as offer significant amounts of compute power. TeraGrid includes over 25 teraflops of computing power, 1 petabyte of data storage, high-resolution visualization environments, and grid services. The TeraGrid is anchored with Intel-based Linux clusters at ANL, Caltech, NCSA and the SDSC and an Alpha-based cluster at PSC that are connected by a 40 Gbps network. TeraGrid is in the process of expanding to include resources at Indiana University, Purdue University, ORNL and TACC (U Texas).

This tutorial includes an overview of the TeraGrid environment and configuration and descriptions of available services. The programming techniques learned in this tutorial will be applicable in many grid communities. Attendees can expect to learn to manage a grid identity and work through several usage scenarios by building and launching sample jobs. Several working applications will be used as examples to illustrate these capabilities. Attendees are expected to be familiar with Fortran or C programming, MPI and basic Unix environments.</abs>
    <awards></awards>
    <intro_level>5</intro_level>
    <inter_level>55</inter_level>
    <adv_level>40</adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>pan120</sub_id>
    <event_type>Panel</event_type>
    <sess_id>65</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>3:30PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>Global Leadership</sess_title>
    <sess_chair>David E. Shaw </sess_chair>
    <title>Leadership in a Global Economy:  &quot;Out-compete&quot; means &quot;Out-compute&quot;   </title>
    <all_auth_inst>see narrative see narrative (various)</all_auth_inst>
    <abs>With technology, talent and capital available globally, the U.S. is facing unprecedented competitiveness challenges from abroad.  Given the diffusion of innovation capacity and the emergence of lower-wage but highly skilled workforces globally, the country that wants to “out-compete” must be able to “out-compute.”

1.Panelists will be asked to describe industry “grand challenge” opportunities that could make a significant contribution to industrial and national competitiveness if more computational capability could be made available to solve them, and to explain the ROI that the industry and the country could expect to receive if these challenges can be successfully addressed.  

2.Panelists will be asked to discuss the obstacles they are facing in acquiring/applying HPC to solve these challenges, particularly the “business barriers”, such as the difficulty finding qualified talent; management viewing HPC a cost instead of an investment; lack of acceptable commercial software; cost of commercial software etc.  </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:08:30</time_stamp>
    <status>active</status>
    <sub_id>pan118</sub_id>
    <event_type>Panel</event_type>
    <sess_id>67</sess_id>
    <sess_date>Friday, November 12</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>10:00AM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>HPCchallenge Benchmarks</sess_title>
    <sess_chair>Jack Dongarra (UTK/ORNL)</sess_chair>
    <title>HPCchallenge Benchmarks: An Expanded View of High End Computers (HEC)</title>
    <all_auth_inst>Luszczek Luszczek (UTK), David Koester (MITRE), John McCalpin (IBM), Jeff Vetter (ORNL), Allan Snavely (SDSC), David Nelson (ITRD)</all_auth_inst>
    <abs>This panel will (1) describe the HPCchallenge Benchmark framework, (2) examine the architecture stresses of the benchmarks, (3) examine the relationships between these benchmarks and real application performance, and (4) examine the political and business motivations to develop and publicize the HPCchallenge benchmarks that look beyond High Performance LINPACK (HPL) and the Top500 List.

The Flop/s metric from HPL has been the de facto standard for comparing High Performance Computers for many years.  The Top500 List provides an opportunity for bragging rights for both HPC vendors and HPC facilities. HPL performs well on many architectures -- including cache-based, distributed memory multiprocessors -- and the measured performance may not be representative of a wide range of real user applications like adaptive multi-physics simulations used in weapons and vehicle designs,  weather and climate models, and defense applications.  HPL is more compute friendly than these applications because it has more extensive memory reuse -- spatial and temporal locality -- in the Level 3 BLAS-based calculations. HPL is highly scalable with respect to both the amount of computation and the communication volume.

The HPCchallenge Benchmarks examine the performance of HPC architectures using kernels with more challenging memory access patterns than just the HPL benchmark used in the Top500 list.  The HPCchallenge Benchmarks build on the HPL framework and augment the Top500 list by providing benchmarks that bound the performance of many real applications as a function of memory access locality characteristics. The HPCchallenge benchmarks are scalable with the size of data sets being a function of the largest HPL matrix for a system.  The HPCchallenge benchmarks include HPL as a reference, and the additional benchmarks take approximately the same time as HPL to run.

The real utility of the HPCchallenge benchmarks are that architectures can be described with a wider range of metrics than just Flop/s from HPL.  When looking only at HPL performance and the Top500 List, inexpensive build-your-own clusters appear to be much more cost effective than more sophisticated HPC architectures.  Even a small percentage of random memory accesses in real applications can significantly affect the overall performance of that application on architectures not designed to minimize or hide memory latency. HPCchallenge benchmarks provide users with additional information to justify policy and purchasing decisions.

Additional information on the HPCchallenge Benchmark can be found at http://icl.cs.utk.edu/hpcc/ </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:09:15</time_stamp>
    <status>active</status>
    <sub_id>pan121</sub_id>
    <event_type>Panel</event_type>
    <sess_id>71</sess_id>
    <sess_date>Friday, November 12</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>HPC Survivor</sess_title>
    <sess_chair>Cherri M. Pancake (NACSE/Oregon State U)</sess_chair>
    <title>HPC Survivor -- Outwit, Outlast, Outcompute</title>
    <all_auth_inst>Thomas Sterling (CalTech), Burton Smith (Cray Inc.), Kenichi Miura (Fujitsu Laboratories)</all_auth_inst>
    <abs>HPC manufacturers will compete for the honor of being named &quot;HPC Survivor 2004&quot;.

The contest will be structured as a series of &quot;rounds,&quot; each posing a specific question about system design, philosophy, implementation, or use.  Contestants will be given 2 minutes to answer the question, and at the end of each round the audience will vote (via applause, boos, etc.) to eliminate a contestant.  The last contestant left wins the title. 

The moderator will be ably assisted by the attractive and knowledgeable Rusty Lusk, who will conduct &quot;exit interviews&quot; as candidates are removed from the competition, giving them an opportunity to explain why the audience is &quot;wrong&quot; in eliminating them. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:08:20</time_stamp>
    <status>active</status>
    <sub_id>pan116</sub_id>
    <event_type>Panel</event_type>
    <sess_id>66</sess_id>
    <sess_date>Friday, November 12</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>10:00AM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>Future of Supercomputing</sess_title>
    <sess_chair>Marc Snir </sess_chair>
    <title>The Future of Supercomputing</title>
    <all_auth_inst>Susan Graham (University of California at Berkeley), Charles Koelbel (Rice University), Jack Dongarra (University of Tennessee at Knoxville)</all_auth_inst>
    <abs>Since early 2003, the US National Academy of Science's Computer Science and Telecommunications Board (CSTB) has run a study of the Future of Supercomputing.  The panel consists of 18 experts in computer and computational science from across industry, academia, and national laboratories, and is chaired by Susan Graham (Berkeley) and Marc Snir (Illinois).  This panel will report its findings on

* What is the current state of high-performance computing, both in the US and internationally?   - What innovations are &quot;in the pipeline&quot; to improve that state?   - Are there foreseeable obstacles to improvement?

* What are the advantages and disadvantages of various approaches to supercomputing, such as custom architectures and commodity clusters?   - Can we identify important areas that need particular classes of machine?

* What are the driving applications that require supercomputing today?   - Are there absolute requirements on computational performance?  If so, what are they?   - Within those applications, what is the mix of &quot;capability&quot; vs. &quot;capacity&quot; computing?

* What are the requirements for supercomputing software on current and future machines?

* How can we invest to ensure future leadership in high-performance computing?   - What is the proper balance of hardware and software?   - What role should government have? What role should private industry have?

 The Future of Supercomputing study will issue a final report to the Department of Energy (both Office of Science and ASCI) and the US Senate in late 2004.  Other branches of the government have also expressed interest in our findings.  This panel will be one of the first public presentations of that material. Additional information about the study can be found on the committeeÌs website at http://www.cstb.org/project_supercomputing.html .

It should be noted that there is a chance that the report will not be delivered on-time, in which case the panelists would be limited to expressing their personal views on the topic. However, due to contractual obligations we are under great pressure to produce a report on-time. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:08:40</time_stamp>
    <status>active</status>
    <sub_id>pan114</sub_id>
    <event_type>Panel</event_type>
    <sess_id>68</sess_id>
    <sess_date>Friday, November 12</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>10:00AM</end_time>
    <sess_room>319-320</sess_room>
    <sess_title>Grid Computing</sess_title>
    <sess_chair>Jennifer M Schopf (Argonne National Laboratory)</sess_chair>
    <title>Grid Computing: Still a Solution Looking for a Problem?</title>
    <all_auth_inst>Ian Foster (University of Chicago and Argonne National Laboratory), Bill Gropp (Argonne National Laboratory), Pete Beckman (NSF TeraGrid Project), Miron Livny (University of Wisconsin)</all_auth_inst>
    <abs>Grid computing allows the use of many resources to solve a single problem in a coordinated way across multiple administrative domains. This approach has been viewed as a promising trend for three reasons: (1) its ability to make more cost-effective use of a given amount of resources, (2) as a way to solve problems that can't be approached without an enormous amount of computing power or storage, and (3) because it suggests that the resources of many computers can be cooperatively harnessed as a collaboration toward a common objective.

However, some people believe that the delivered capabilities of Grid computing have not yet lived up to its potential, and users have been left with many open questions as to the functionality, feasibility, and usability or many Grid strategies. Questions that this panel will address include: What can be done to address Grid hype? What will the Grid really look like? What’s the most important problem still to solve to make it successful? And, how do Grids differ from what’s being done already in distributed computing, clusters, and OS design?

We propose a panel to address some of the open questions in Grid computing by assuring the participation of the leaders of the field. Ian Foster will argue that Grids are solving problems, as seem by the many projects currently using the Globus Toolkit.  Bill Gropp will be able to address much of the ease of use and accessibility issues by contrasting attempted Grid approaches to tried and true MPI-based solutions, Pete Beckman will address some of the issues seen in the TeraGrid project’s large scale deployment. Miron Livny will bring supporting evidence from his work with Condor, and with another large scale deployment, Grid3. In addition, we hope to recruit a security expert to address the security needs of Grid computing, since it is evident that until basic security has been addressed, Grid computing will remain infeasible.  </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:08:50</time_stamp>
    <status>active</status>
    <sub_id>pan113</sub_id>
    <event_type>Panel</event_type>
    <sess_id>69</sess_id>
    <sess_date>Friday, November 12</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>315-316</sess_room>
    <sess_title>GLIF Infrastructure</sess_title>
    <sess_chair>Maxine Brown (UIC)</sess_chair>
    <title>GLIF Infrastructure -- Why Do We Need 10Gbps Networks?</title>
    <all_auth_inst>Peter Clarke (University College London, UK), Thomas A. DeFanti (University of Illinois at Chicago), Jun Murai (Keio University, Japan), Kees Neggers (SURFnet bv, The Netherlands), Bill St. Arnaud (CANARIE, Canada)</all_auth_inst>
    <abs>The fourth annual Lambda Workshop will be held September 3, 2004 in Nottingham, United Kingdom (UK). These international, invitation-only, high-level Lambda Workshops, were created as a forum for discussing global optical networking strategies (both the short-term and the long-term), global optical networking testbeds, and global optical networking operational issues (with respect to the purchase, connection and management of Lambdas). Workshops resulted in the formation of GLIF, the Global Lambda Integrated Facility; for more information, see &lt;www.glif.is&gt;. These panelists, all participants in GLIF, have been involved with the procurement, management, interoperability and use of internationally connected 10 Gbps lambdas. 

* Why are Canada, Japan, the Netherlands, the United Kingdom and the United States involved in GLIF?

* What can be accomplished on lambdas that cannot be accomplished on &quot;best-effort&quot; networks?

* Where does your infrastructure connect?

* When will we implement a totally functional LambdaGrid? 

* What do you see as the future of networking?  </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:09:01</time_stamp>
    <status>active</status>
    <sub_id>pan111</sub_id>
    <event_type>Panel</event_type>
    <sess_id>70</sess_id>
    <sess_date>Friday, November 12</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>317-318</sess_room>
    <sess_title>European Grid/HPCC</sess_title>
    <sess_chair>Richard S. Hirsh (Science Foundation Ireland)</sess_chair>
    <title>Grid/HPCC Activities in Europe</title>
    <all_auth_inst>Fabrizio Gagliardi (CERN), Victor Alessandrini (IDRIS (Institut du Développement et des Ressources en Informatique Scientifique)), Tony Hey (EPSRC), Dany Vandromme (RENATER)</all_auth_inst>
    <abs>The similarities and differences between the U.S. and European approaches to Grids, Networking, High-End Computing, the tools required to make them work more efficiently, and the applications to first take advantage of these technologies will be highlighted in a series of talks by leaders of the European scientific community.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:05:08</time_stamp>
    <status>active</status>
    <sub_id>scgw108</sub_id>
    <event_type>Workshop</event_type>
    <sess_id>82</sess_id>
    <sess_date>Friday, November 12</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>321</sess_room>
    <sess_title>Nanoscience Technology &amp; Simulation</sess_title>
    <sess_chair></sess_chair>
    <title>Nano-Science and Technology Simulations through High-End Computing</title>
    <all_auth_inst>David Kahaner </all_auth_inst>
    <abs>WORKSHOP ON LARGE-SCALE NANO SIMULATION

 “Predictive computational chemistry at the nanoscale”

Robert J. Harrison, William A. Shelton, Bobby G. Sumpter and Vincent Meunier (ORNL)

 

&quot;Scaling First Principles Nanoscience and Materials Science Codes to Thousands of Processors&quot;

Andrew Canning (speaker) and Lin-Wang Wang (LBL/NERSC)

 

“Quantum Mechanical Simulations for Nanomaterials”

Yoshiyuki Miyamoto, NEC Fundamental &amp; Environmental Res. Labs., (NEC Japan)

 

 “Impact of the Cray-X1 on nano-science: decisive changes in our understanding of strongly correlated electron systems”

            Thomas Schulthess (ORNL)

 

“Large scale simulation for the generating mechanism of continuous terahertz waves through the nano-scale high-temperature superconductor device.”

Mikio Iizuka, Hisashi Nakamura ( Research Organization for Information Science &amp; Technology, Japan) and Masashi Tachiki( National Institute of Material Science, Japan)

 

&quot;HPC simulations of quantum dots consisting of multi-million atoms using NEMO 3-D&quot; 

Gerhard Klimeck (Purdue &amp; JPL), Sebastien Goasguen (probable speaker) (Purdue), Haiying Xu (Purdue), Faisal Saied (Purdue), Mohammed Sayeed (Purdue),  Hook Hua (JPL), Seungwon Lee  (JPL), Fabiano Oyafuso  (JPL), Olga Lazarenkova  (JPL), Paul von Allmen  (JPL)

</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:04:20</time_stamp>
    <status>active</status>
    <sub_id>scgw102</sub_id>
    <event_type>Workshop</event_type>
    <sess_id>76</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>406</sess_room>
    <sess_title>High-Performance Cluster Storage</sess_title>
    <sess_chair></sess_chair>
    <title>High Performance Storage for Cluster Computing:  What Do We Do Next?</title>
    <all_auth_inst>Garth Gibson </all_auth_inst>
    <abs></abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:04:32</time_stamp>
    <status>active</status>
    <sub_id>scgw103</sub_id>
    <event_type>Workshop</event_type>
    <sess_id>77</sess_id>
    <sess_date>Sunday, November 7</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>406</sess_room>
    <sess_title>Open MPI Workshop</sess_title>
    <sess_chair></sess_chair>
    <title>Open MPI</title>
    <all_auth_inst>George Bosilca (University of Tennessee Knoxville)</all_auth_inst>
    <abs></abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Dongarra, Capello, Fagg, Bosila</presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:04:10</time_stamp>
    <status>active</status>
    <sub_id>scgw101</sub_id>
    <event_type>Workshop</event_type>
    <sess_id>75</sess_id>
    <sess_date>Saturday, November 6</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>China/HPC Workshop</sess_title>
    <sess_chair></sess_chair>
    <title>China/HPC Workshop</title>
    <all_auth_inst>Laurence Godret (Asian Technology Information Program)</all_auth_inst>
    <abs>The Asian Technology Information Program (ATIP) will conduct its second annual China HPC workshop in conjunction with SC04, on Saturday, November 6, 2004.  The workshop will consist of a one-day program of presentations highlighting key Chinese trends &amp; developments, as well as research programs and associated applications in HPC.  The workshop will again feature a delegation of Chinese HPC researchers, and in addition to a full-day of presentations and panels, include opportunities for attendees to develop linkages with their Chinese counterparts.  Additional information about ATIP and its second annual China HPC workshop can soon be found at http://www.atip.org.

 

For questions or to request updates, contact Laurence Godret at lgodret@atip.org or at 1-505-842-9020.

  

For questions or to request updates, contact Laurence Godret at lgodret@atip.org or at 1-505-842-9020.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:05:22</time_stamp>
    <status>active</status>
    <sub_id>scgw106</sub_id>
    <event_type>Workshop</event_type>
    <sess_id>80</sess_id>
    <sess_date>Friday, November 12</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>302</sess_room>
    <sess_title>APART Workshop</sess_title>
    <sess_chair></sess_chair>
    <title>6th International APART (Automatic Performance Analysis Tools &amp; Performance Tools for the Grid) Workshop </title>
    <all_auth_inst>Michael Gerndt (Technische Universitat Munchen), Peter Kacsuk (MTA SZTAKI)</all_auth_inst>
    <abs>http://wwwbode.cs.tum.edu/~gerndt/home/Research/APARTsc04.htm

The workshop is the 6th in a series of workshops that is organized by the European Working Group on Automatic Performance Analysis: Real Tools (www.fz-juelich/apart). The workshop was held with the SC conference in 1999, 2001, and 2003. In 2000 and 2002 it was held in conjunction with EuroPar. APART has 11 partners, three from the US (Bart Miller, Allen Malony and Daniel Reed) and 8 from Europe. This workshop will be coorganized by the Technical Working Group on Performance Analysis of the European GRIDSTART working group (www.gridstart.org). It is the goal of the workshop to bring together people working on performance analysis tools for parallel systems and grids. Most of the attendees come regularly to SC and are willing to spend an additional day for the workshop. The workshop is a one day workshop which was always held on the last day of the SC conferences. The speakers are invited by the organizers. No proceedings are published. Attendance is free for all Supercomputing participants. Attendees have to register for SC. In the last years we had about 40 participants. The previous SC conferences provided us with a room and beamer for free. This year program will be split into two parts: The first part will concentrate on Automatic Performance Analysis while the second will be devoted to performance tools for the grid. We will invite speakers from APART and GridStart as well as from related projects all over the world. The program of the previous workshops can be found at the APART web site (www.fzjuelich. de/apart).</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Michael Gerndt graduated in 1985 with a Diplom in Computer Science from the University of Bonn. Starting in 1986, he worked on the topic of automatic parallelization for distributed memory machines in the framework of the SUPRENUM project at the University of Bonn and received a Ph.D. in Computer Science in 1989. For two years, in 1990 and 1991, he held a postdoc position at the University of Vienna and joined the Research Centre Jülich in 1992 where he concentrated on programming and implementation issues of shared virtual memory systems. This research led to his Habilitation in 1998 at the TU München. Since July 2000 he is professor for architecture of parallel and distributed systems at Technische Universität München. He is the coordinator of the Esprit working group APART on automatic performance analysis. He is also leading the Peridot project funded by KONWIHR which is developing an automatic performance analysis environment for the German Teraflop Computer, the Hitachi SR8000 at Leibniz Computer Center in Munich. He is also partner in the EP-Cache project focusing on efficient programming of cache architectures which is funded by the Bundesministerium für Bildung und Forschung (BMBF).</presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:05:34</time_stamp>
    <status>active</status>
    <sub_id>scgw107</sub_id>
    <event_type>Workshop</event_type>
    <sess_id>81</sess_id>
    <sess_date>Friday, November 12</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>406</sess_room>
    <sess_title>Advancing Research &amp; Education</sess_title>
    <sess_chair></sess_chair>
    <title>Advancing Research and Education in the New Landscape of Increasingly Complex Science Drivers, Tools, &amp; Technologies </title>
    <all_auth_inst>Kelvin Droegemeier (University of Oklahoma)</all_auth_inst>
    <abs></abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:04:55</time_stamp>
    <status>active</status>
    <sub_id>scgw104</sub_id>
    <event_type>Workshop</event_type>
    <sess_id>78</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room></sess_room>
    <sess_title>Grid Workshop</sess_title>
    <sess_chair></sess_chair>
    <title>Grid Workshop</title>
    <all_auth_inst>Craig Lee (Aerospace Corporation)</all_auth_inst>
    <abs></abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:04:43</time_stamp>
    <status>active</status>
    <sub_id>scgw105</sub_id>
    <event_type>Workshop</event_type>
    <sess_id>79</sess_id>
    <sess_date>Monday, November 8</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>406</sess_room>
    <sess_title>HPCS Workshop</sess_title>
    <sess_chair></sess_chair>
    <title>High Productivity Computing Systems</title>
    <all_auth_inst>Jeremy Kepner (MIT)</all_auth_inst>
    <abs> The DARPA High Productivity Computing Systems (HPCS) program is focused on providing a new generation of economically viable high productivity computing systems for national security and for the industrial user community. HPCS researchers have initiated a fundamental reassessment of how we define and measure performance, programmability, portability, robustness and ultimately, productivity in the HPC domain. The value of a High Performance Computing (HPC) system to a user includes many factors, such as execution time on a particular problem, software development time, direct hardware costs and indirect administrative and maintenance costs. The HPCS program is developing systems that deliver increased value to users at a rate commensurate with the rate of improvement in the underlying technologies. This workshop will provide an opportunity for the broader HPC community to see the latest results from HPCS Vendors and the HPCS Productivity team and to provide feedback to the HPCS researchers.

Agenda 01:30-01:35 Opening Remarks (Bob Graybill/DARPA and Fred Johnson/DOE SC) 01:35-02:35 HPCS Vendor Presentations (Mootaz Elnozahy/IBM, Burton Smith/Cray, Jim Mitchell/Sun) [20 min/each] 02:35-02:45 High Productivity Languages (Hans Zima/JPL) [10 min] 02:45-03:-0 Discussion and Feedback 03:00-03:30 BREAK 03:30-03:50 Productivity Team Overview (Jeremy Kepner/Lincoln) [20 min] 03:50-04:10 Development Time Experiments (Victor Basili/UMD)            04:10-04:30 Benchmarks/Execution Time Modeling (David Koester/MITRE, Bob Lucas/ISI) 04:30-04:50 Existing Codes Analysis (Doug Post/LANL) 04:50-05:00 Discussion and Feedback </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scgm108</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>27</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>4:15PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>HPC Acquisition</sess_title>
    <sess_chair>Jeffrey K. Hollingsworth (University of Maryland)</sess_chair>
    <title>CART - A Strategic Acquisition Model for HPC</title>
    <all_auth_inst>Suresh Shukla (Boeing Company)</all_auth_inst>
    <abs>Defining a Strategic Acquisition Model for HPC is of value to an organization. It helps an industrial outfit, a University, or a Government Lab to ensure that all aspects of acquisition are considered in its decision-making. CART helps define such a model. CART stands for the following components: Cost, Applications, Requirements, and Technology. These components are furthe sub-divided into subcomponents. Depending upon different emphasis that an organization assigns to different subcomponents, an improved acquisition decision can be reached. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Suresh Shukla is HPC Service Manager at the Boeing Company for the last 14 years. He has been involved in acquiring HPC Systems for Boeing during that time. Boeing has performed TCO analyses for these acquisitions under Suresh's directions. Suresh put together a strategic acquisition model, called &quot;CART&quot;, which he introduced to the HPC community through HPC User Forums in USA and Europe last year. He was interviewed in HPCwire on it. Suresh is on the HPCC committee on US Council on Competitiveness. He also serves on the Executive and Technical committees of the HPC User Forum. Due to his contributions to HPC Industry, Suresh is recognized amongst &quot;the Top People to Watch&quot; list for 2004 by HPCwire.</presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scgm109</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>28</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>11:15AM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>Homeland Security</sess_title>
    <sess_chair>Harvey J. Wasserman (Los Alamos National Laboratory)</sess_chair>
    <title>Modeling Structural Response to Blast Loading Using a Coupled CFD/CSD Methodology </title>
    <all_auth_inst>Joseph Baum (SAIC)</all_auth_inst>
    <abs>This talk describes simulations used in support of several force-protection and counter-terrorism efforts that predict response to weapon detonation, fragmentation, and airblast interactions with reinforced concrete and steel walls, with a multi-chamber steel tower that is part of a critical transportation link, and with a generic steel ship hull.  The method loosely couples 3-D Computational Fluid Dynamics (CFD) and Computational Structural Dynamics (CSD) using an “embedded-mesh” approach, in which the CSD objects “float” through the CFD domain. The simulations clearly demonstrate the advantages of the coupled methodology. Past decoupled methods used CFD to provide pressure loading on exterior surfaces, followed by a CSD calculation of the structural response, structure break-up, and fragment impact on downstream interior walls. This approach would not have accounted for pressure loading on the interior walls, which, for larger charges, is the dominant loading and failure mechanism. For larger charges internal walls are damaged by the propagating blast wave, before being impacted by the flying fragments generated by the upstream failed walls.

Collaborators on this work include Hong Luo and Eric Mestreau (SAIC); Rainald Lohner (GMU); Charles Charman (GA); and  Daniele Pelessone (ES3)</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Dr. Joseph D. Baum’s research interests over the past several years have been focused on the use of unstructured grids for the solution of complex Computational Fluid Dynamics problems, specifically for shock physics, coupling of fluid and structural dynamics, fluid dynamics of propulsion systems, and combustion instability of rocket motors. Dr. Baum received his Ph.D. in Aerospace Engineering from the Georgia Institute of Technology, Atlanta, GA, in 1980. He conducted research at the Air Force Rocket propulsion Laboratory, Edwards AFB, CA (1980-1985), the Naval Research Laboratory, Washington, DC (1985-1989), and has been with SAIC since then. Dr. Baum has more than 200 publications to his credit, and has won the prestigious Glass award (1985), and the SAIC Award for Excellence in Research (1997,2004). Dr. Baum has served as a PI on several major forensic studies, such as the World Trade Center, Khobar Towers, Kenya and Tanzania US embassies, and has led several survivability studies of high-visibility sites. </presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scgm117</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>33</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>4:15PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>Data Intensive Applications</sess_title>
    <sess_chair>Thomas Nelson </sess_chair>
    <title>NOAA Data Challenges for the Future</title>
    <all_auth_inst>Wayne Faas (NOAA)</all_auth_inst>
    <abs>The National Oceanographic and Atmospheric Administration’s (NOAA) mission is to understand and predict changes in the Earth’s environment and conserve and manage coastal and marine resources to meet our Nation’s economic, social, and environmental needs. NOAA’s role in facing these challenges is to assess and predict environmental changes, protect life and property, provide decision makers with reliable information, manage the Nation’s marine and coastal resources, and foster environmental stewardship.  NOAA’s science-based management approach involves data and information of many types – weather and climate and coastal, ocean, and lakes.

The NOAA National Environmental Satellite Data and Information Service’s (NESDIS) three data centers (climatic, oceanographic, and geophysical) are responsible for the stewardship of NOAA’s environmental data. They will experience tremendous data growth in the next several years with the launching of new satellites; integrating data observing systems from the state, local, and private sectors; and implementing global climate observing system agreements.  This presentation will show how NESDIS’ National Climatic Data Center is providing stewardship of the data now and what challenges it faces in the near future. 

</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Wayne M. Faas  is the Chief of the Data Operations Division at National Oceanic and Atmospheric Administration’s (NOAA) National Climatic Data Center (NCDC).  He is responsible for the acquisition, inventory, data management, quality control, and historical data rescue of NOAA’s environmental data (surface (land and marine), upper-air meteorological/climatological data, weather radar, satellite, and other information).  He manages NCDC environmental data archives of over 1 1/2 petabytes and applies new technologies and techniques to perform quality assurance of data and incorporates the data and information into digital data bases.  A retired Air Force Lieutenant Colonel, he is the liaison with Air Force and Navy units co-located in Asheville to meet the climate needs of the Nation.  Other operational projects include implementing NCDC’s first on-line data access system and developing the electronic ingest of the National Weather Service’s WSR-88D Next Generation Radar Data (NEXRAD).  </presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scgm105</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>29</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>3:30PM</begin_time>
    <end_time>4:15PM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>Automotive Industry</sess_title>
    <sess_chair>Steven Joachims </sess_chair>
    <title>Recent Trends in Use of HPC in Automotive Product Development</title>
    <all_auth_inst>Alexander Akkerman (Ford Motor Company)</all_auth_inst>
    <abs>Ford Motor Company's High Performance Computing has transitioned into a large scale multi-platform environment supporting the company's vehicle engineering and research world-wide. Integrated resource management, shared file systems and security enable migration of Computer Aided Engineering (CAE) workloads to appropriate hardware architectures ranging from Linux clusters to SMP cache-based and vector systems. CAE has become an integral part of the vehicle development process placing greater importance on efficient use of HPC resources.

Recent advances in distributed memory software technologies enabled the transition to COTS-based systems. Safety and CFD user communities have benefited from this transition, however others have been left behind. Many of early CAE users have not seen any performance improvements since the introduction of the Cray T90 systems in 1995, especially considering increased refinement and complexity in models. The challenge for the HPC community is to provide new performance levels for every HPC user and an environment for tackling new problems that cannot be adequately addressed with today's hardware.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scgm115</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>32</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>2:15PM</begin_time>
    <end_time>3:00PM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>Electronic Design and Automation</sess_title>
    <sess_chair>Jennifer M Schopf (Argonne National Laboratory)</sess_chair>
    <title>Grids Today, From Silos to the Enterprise</title>
    <all_auth_inst>Tom Grotton (Cadence Design Systems, Inc)</all_auth_inst>
    <abs>This presentation discusses where Grids are today and where they are going. Why are there so many different approaches to enterprise grids? We will look at the differences between grids that process interactive requests and those that serve batch requests. Why can’t you always buy the solution that meets your needs? Because there are gaps in the current grid technology. One of these gaps is how to handle workflow processing in a massively parallel environment. Cadence fills that gap using Automated Workflow Grid Technology, which this talk will describe.  Grid requirements are discussed from both an internal (infrastructure) and external (product) point of view. Current capabilities are presented, followed by a consideration of grid futures at Cadence. We will discuss the impact of Automated Workflow Grid Technology at Cadence, including total-cost-of-ownership, productivity, quality, time-to-market, and the competitive grid space.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Tom Grotton heads the Cadence Design, Inc. Server Farm Initiative (SFI), which is an IT company-wide initiative to maximize the use of investment in the thousands of computer systems that are deployed throughout Cadence. Tom has been in the computer industry for over 20 years. He has worked in DOD computer systems, has spearheaded system integration and configuration management organizations and has a strong background in engineering business process definition and development. Before moving to Cadence, Tom managed IT for a division of Honeywell. He joined Cadence in 1998 as a Director of Configuration Management with international responsibilities. His contributions to the company include a cost reduction of more than $80M over the next five years.</presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scgm114</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>29</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>4:15PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>Automotive Industry</sess_title>
    <sess_chair>Steven Joachims </sess_chair>
    <title>Computational Requirements and Scalability Considerations for CAE Applications in the Automotive Industry</title>
    <all_auth_inst>Christian Tanasescu (SGI, Inc.)</all_auth_inst>
    <abs>We investigate scalability, architectural requirements, and performance characteristics of some key CAE applications used in automotive supercomputing. While Crash Simulation applications consume most of the processor cycles according to the Top20Auto study and require high floating point processing speed, NVH (Noise, Vibration, Harshness) is the most demanding in memory and IO bandwidth, causing maximum stress to the entire system architecture. 

Computer architecture requirements may differ if the application is scalable. While CFD can scale to higher processor counts, which reduces the memory bandwidth requirement per processor as the data set per processor becomes smaller, NVH exhibits reduced scalability so that memory bandwidth per processor determines overall performance. The ratio of loads and stores to floating-point operations varies for real applications from 1 to 4 with a mean of about 2. 

The talk gives also an overview of the trends in CAE for the automotive market in terms of system architecture, processor architecture, operating systems and applications.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Christian Tanasescu is director of Engineering for Simulation applications  at SGI. His duties include developing strategy for HPC applications within  SGI core industries. Christian has been involved in High Performance Computing for more than 23 years, with a particular focus on CAE in applications in automotive and aerospace industries. Before joining SGI in 1992, Christian Tanasescu has held positions at  Siemens, Fujitsu, Tecan Systems and a Nuclear Power Research Institute. He is member of the SIAM Parallel Processing Committee and VDI, the German  Automotive Engineering Association.</presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scgm113</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>30</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>2:15PM</begin_time>
    <end_time>3:00PM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>HPC Goes to the Movies</sess_title>
    <sess_chair>Debra S.  Goldfarb </sess_chair>
    <title>There and Back Again: A Hobbit's Tale of Supercomputing</title>
    <all_auth_inst>Milton Ngan (Weta Digital Ltd)</all_auth_inst>
    <abs>The &quot;Lord of the Rings&quot; Movie Trilogy was a project of unprecendented size and complexity. The computing resources required to complete these movies were not dissimilar in scale. The company behind nearly all the computer generated visual effects was Weta Digital, a small company with a handful of SGI based graphics workstations and server. Over the course of LOTR, Weta Digital became one of the largest Linux based visual effects companies in the industry. This transition and growth was not without problems, but also not without a vision either. Now at the end of the project, there is an opportunity to reflect back on the past few years and learn more about the future direction of large scale computing in the visual effects industry. With this hindsight we can see the effect of low cost computing on our industry and how it has changed the way that we perceive what is possible and what is attainable. This new perspective poses new challenges in the way we design and build computing farms and storage systems. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Milton Ngan has recently finished working on the production of &quot;The Lord of the Rings&quot; trilogy and is in the middle of restructuring the technical infrastructure for future productions. Previous productions included &quot;The Frighteners&quot; and &quot;Contact&quot;. Milton has spent the last eight years in the visual effects industry, building up Weta Digital's technical infrastructure from scratch. In past lives, Milton worked for Internet Service Providers, and was part of a team that founded an Internet Service Provider for students at Victoria University of Wellington, where he received a Masters of Science in Computer Science.</presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scgm101</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>32</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>2:15PM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>Electronic Design and Automation</sess_title>
    <sess_chair>Jennifer M Schopf (Argonne National Laboratory)</sess_chair>
    <title>Building Your Enterprise Grid</title>
    <all_auth_inst>Brooklin Gore (Micron Technology, Inc.)</all_auth_inst>
    <abs>Like any major project, a successful enterprise Grid implementation requires a well-defined purpose, a solid foundation, and a clear vision for a successful outcome. While Grid computing hype abounds and commercial Grid offerings proliferate, a well-planned Grid deployment with noble yet achievable goals can succeed in delivering new capabilities at reduced costs. This presentation leverages more than three years of experience at a large, global manufacturing company in building a successful, general purpose, enterprise Grid. Several attainable goals for a Grid project are discussed in addition to infrastructure and philosophical best practices. Sample applications are recommended that have broad appeal and good Grid success track records. Tips for engaging and pleasing both the CFO and CTO are shared. Finally, pitfalls and challenges are highlighted to ease your adoption of this exciting technology.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Brooklin Gore has been researching and implementing enterprise Grid technologies for over three years to create Micron’s global Grid infrastructure, which runs over 15 production applications today. Brooklin has been with Micron for 16 years. In that time he served as a product engineer, Computer Aided Design group manager, network manager and general manager of Micron’s Internet Services Division. Brooklin has been issued several United States patents and is a Senior Member of the IEEE. He holds Bachelor of Science degrees in computer science and electrical engineering from the University of Idaho and a Masters of Science in computer science from the National Technological University.</presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scgm111</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>33</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>3:30PM</begin_time>
    <end_time>4:15PM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>Data Intensive Applications</sess_title>
    <sess_chair>Thomas Nelson </sess_chair>
    <title>High Performance Computing at Fleet Numerical Meteorology and Oceanography Center</title>
    <all_auth_inst>Mike Clancy (Fleet Numerical Meteorology and Oceanography Center)</all_auth_inst>
    <abs>The Navy's Fleet Numerical Meteorology and Oceanography Center (FNMOC) produces numerical weather prediction (NWP) products to meet the unique needs of the Navy and Marine Corps.  Mesoscale NWP models, with horizontal spatial resolutions of less than 10 km, are a key part of this support and the principal consumers of High Performance Computing (HPC) resources at FNMOC.  With the resolutions of these models being driven ever finer by military requirements and scientific advances, they demand HPC performance on the order of many TFLOPS peak computational rate and TB per day of data throughput. To address these challenges, FNMOC has fielded a powerful production environment involving a combination of SGI/TRIX and IBM/AIX HPC platforms for number crunching, and an IBM/Linux solution for pre-processing, post-processing and data distribution.  FNMOC is also beginning to explore opportunities afforded by remote operations and grid computing to meet its burgeoning demand for HPC resources.  This talk is co-authored by Doug Wenger, Northrop Grumman Information Technology. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Mr. Clancy graduated first in his class from Florida Institute of Technology in 1973 with a B.S. degree in physical oceanography.  He received an M.S. in meteorology from the University of Miami in 1975, and went to work for Science Applications Incorporated.  He joined the Naval Ocean Research and Development Activity (NORDA; now part of the Naval Research Laboratory) as a research oceanographer in 1979, and the Fleet Numerical Meteorology and Oceanography Center as a supervisory oceanographer in 1983.  He currently serves as Fleet Numerical's Chief Scientist and Deputy Technical Director.  Mr. Clancy has authored or coauthored over 100 publications in meteorology, oceanography and information technology, and has received numerous professional awards, including two Government Technology Leadership Awards, the Navy's Meritorious Civilian Service Award, and the Navy's Superior Civilian Service Award. </presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_16:12:01</time_stamp>
    <status>active</status>
    <sub_id>scgm103</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>30</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>2:15PM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>HPC Goes to the Movies</sess_title>
    <sess_chair>Debra S.  Goldfarb </sess_chair>
    <title>How Do You Make a Green Ogre?  Supercomputing and Shrek2</title>
    <all_auth_inst>Andy Hendrickson (PDI/Dreamworks)</all_auth_inst>
    <abs>In a large 40TB bowl, mix equal parts inspiration and perspiration.  Add in a pinch of Global Illumination, Visual Effects, Ray Tracing, and Volume and Scanline Rendering.  Mix with Linux.  Bake at 10 million CPU hours in a 3600 CPU renderfarm, check periodically with a job queueing system until done.  Decorate with Dialog, Music, and Sound Effects. Voila! Out comes a computer graphics masterpiece serving millions.  

Explore our secret recipe that utilized supercomputing to make the hit feature animated film 'Shrek2'.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Mr Hendrickson joined PDI/Dreamworks as the Head of Technology in 2002 to study the habitat and ritual of the Green Ogre in a successful attempt at cloning them for the sequel &quot;Shrek2&quot;.

Prior to that time, he spent 12 years at LucasFilm, first lurking about the subterranean computer rooms of Skywalker Sound aiding the dawn of digital audio for motion pictures.  Next he took up practice herding VAXen, SGIs, Linux boxes and Software for Industrial Light and Magic.

There, during the making of Jurassic Park, after helping to breathe life into Dinosaurs, he assisted in the transformation of the special effects industry from it's photochemical roots to digital, successfully providing effects for dozens of films, including &quot;Star Wars&quot;, &quot;Forrest Gump&quot;, &quot;Perfect Storm&quot;, &quot;Twister&quot;, and &quot;Saving Private Ryan&quot; to name a few.

A graduate of the University of California Berkeley with a degree in Physics, Mr. Hendrickson is a midwestern native transplanted to the Bay Area of California.</presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scgm104</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>27</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>3:30PM</begin_time>
    <end_time>4:15PM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>HPC Acquisition</sess_title>
    <sess_chair>Jeffrey K. Hollingsworth (University of Maryland)</sess_chair>
    <title>Making HPC System Acquisition Decisions Is an HPC Application</title>
    <all_auth_inst>Larry Davis (DoD High Performance Computing Modernization Program)</all_auth_inst>
    <abs>Coauthors for this work include Larry P. Davis and Cray J. Henry of the Department of Defense High Performance Computing Modernization Program Office, Roy L. Campbell, Jr., of the U.S. Army Research Laboratory, William Ward of the U.S. Army Engineer Research and Development Center, and Allan Snavely and Laura Carrington of the University of California at San Diego.

The DoD High Performance Computing Modernization Program spends approximately $50 M annually acquiring high performance computing capability for its DoD customer base. It purchases these high performance computers based on two primary criteria: system usability and a combination of system performance and price/performance. This paper details the quantitative determination of performance and price/performance based on times-to-solution on the program's suite of benchmarks. System usability is determined qualitatively on a variety of system factors affecting usability.

Actual acquisition decisions are based heavily on the results of a constrained linear optimizer that uses the detailed performance (by individual application benchmark test case) and cost data on candidate systems to produce a rank-ordered list of system combinations. Each member of this list is a combination of systems that maximizes its total performance for a given cost subject to the constraint that allocation of the overall computational work among the benchmark test cases must be within some narrow bounds of the desired weights for the set of benchmark test cases.  Use of the optimizer in this way allows for combinations of systems that may not have the best performance on each and every benchmark test case, but that, when combined with complementary systems within the combination, may produce the best overall performance score for the available funding.

Currently, the benchmark suite used to obtain performance scores for acquisition is primarily composed of full application codes with relevant test cases that span the DoD computational workload. These codes, while representative of the program's workload, are nevertheless time-consuming to run and score. The program is investigating the validity of basing future acquisitions on a set of much simpler microbenchmarks, which are being designed to gather key basic system attributes that, when combined with a detailed profile of key applications, could be used to accurately predict performance of those key applications on each offered system. These microbenchmarks are designed to run on a minimal number of processors and then can be used to predict the performance of a much larger system of that type, saving significant vendor manpower and system resources. This much simpler benchmarking procedure is currently being tested and may potentially impact program acquisitions within the next two years.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Dr. Davis has more than 30 years experience in science and technology (S&amp;T) as a scientist, leader of research groups, faculty department research director, scientific program manager, S&amp;T manager on executive staff, and information technology program manager, with specialization in high performance computing.  He completed a 21-year active duty career in the Air Force.  His computational science experience includes both research and management experience:  he has published over twenty refereed journal articles in addition to numerous technical reports. He was one of the founders of the DoD High Performance Computing Modernization Program (HPCMP) and has worked in various capacities in that program for the last twelve years, including his current assignment as Deputy Director, as an employee of the Institute for Defense Analyses.  Dr. Davis also is responsible for several major activities within the program, including the program’s benchmarking and performance modeling activities.   </presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scgm102</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>34</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>11:15AM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>Oil and Gas Exploration</sess_title>
    <sess_chair>Stephen Poole (Los Alamos National Laboratory)</sess_chair>
    <title>Supercomputing on the Ocean Waves</title>
    <all_auth_inst>Michael Turff (Petroleum Geo-Services)</all_auth_inst>
    <abs>An introduction to the use of supercomputers for oil exploration at sea.  We will explore the reasons for carrying large processing systems on seismic vessels, the logistics involved, the particular issues raised by 24/7 use at sea and what the future might hold.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Michael Tuff has worked in scientific supercomputing since 1974.  Initially he worked at the UK Meteorological Office running forecast models on an IBM 360/195.   In 1978 he joined Western Geophysical in London where he worked on Floating Point Systems, Star Technology, IBM array processors, IBM Vector Facility processors,  and finally SP2 systems all doing seismic data processing. At Petroleum Geo-Services (PGS) he is working with large clusters of Pentium 4 and Opteron processors both in land based data centers and on vessels at sea.  He is one of the pioneers of Capacity-on-Demand use of PC clusters.</presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scgm116</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>34</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>11:15AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>Oil and Gas Exploration</sess_title>
    <sess_chair>Stephen Poole (Los Alamos National Laboratory)</sess_chair>
    <title>Oil Patch Supercomputing</title>
    <all_auth_inst>J. Bee Bednar (3dBee Technology, Inc.)</all_auth_inst>
    <abs>Oil patch supercomputing turns on two extremely different gears. One efficiently handles truly enormous data sets; the other  handles a complex computational framework with many tightly coupled parameters.

This talk will explain how seismic data is  recorded on both land and sea, why it’s so embarrassingly parallel, why there is so much of it, how it is currently imaged, and why this is such a complex process.  Using a carefully defined set of equations that clearly illustrate the current and potential computational complexity, I contrast what we do now and what we probably will do in the next 10-15 years.  I provide a somewhat wishful-thinking-processing stream that if it completed in a reasonable amount of time would substantially improve both the ability to image and the ability to extract information essential to successful exploration and production.  I briefly contrast seismic imaging, reservoir modeling, and  even a bit of weather modeling.

However, the talk’s emphasis is on seismic imaging, seismic exploration, and the use of seismic data in production.

</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>After receiving a Ph.D. in Mathematics at U. T. Austin Bee did research in Anti-Submarine Warfare and taught Mathematics at Drexel University and the University of Tulsa.  He was Manager of Seismic Research at Cities Service Company and later became Manager and then Director of Geophysical Sciences at Amerada Hess, where he was instrumental in development of distributed seismic processing software and lead Amerada to the forefront of prestack depth imaging and computer assisted interpretation. He has participated in over 100 prestack depth imaging and interpretation projects and has published over 75 papers in Mathematics, Electrical Engineering, Geophysics, and Computer Science.  After retiring from Amerada Hess he became Vice President of Research and Development at Advanced Data Solutions where he was instrumental in introducing LINUX based cluster computers to the energy industry.  He is currently Founder and President of 3DBee Tech Inc where he consults for companies engaged in the exploration for and production of hydrocarbons.</presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scgm112</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>31</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>11:15AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>Optical Networking and Cyberinfrastructure</sess_title>
    <sess_chair>Gwendolyn Huntoon </sess_chair>
    <title>Cyberinfrastructure in the Earth Sciences - A Necessary and Timely Collaboration</title>
    <all_auth_inst>John Orcutt (Scripps Institution of Oceanography)</all_auth_inst>
    <abs>Major new programs in the Earth Sciences extending from the study of deep Earth structure to the observation of Earth from space are providing an unprecedented flow of near-real-time data. For example, the NSF ORION (Ocean Research Interactive Ocean Networks) program, funded as a NSF Major Research Equipment and Facilities Construction project, provides a global sensor network returning data in very large volume with latencies of seconds.  The challenge of constructing a global data grid to support these massive observational programs and a computational grid to assimilate the data for nowcasting and forecasting closely ties the disciplines of Earth and computational sciences together. We have a mutually important and scientifically exciting goal of understanding the Earth and the changes inherent in its evolution on time and space scales spanning many orders of magnitude.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>John Orcutt is Deputy Director for Research at Scripps Institution of Oceanography and Director of the UCSD Center for Earth Observations and Applications. Prof. Orcutt is a graduate of Annapolis (1966) and received his M.Sc. in physics as a Fulbright Scholar at the University of Liverpool.  He served as a submariner and advanced to the rank of Commander.  He received his PhD in Earth Sciences from Scripps in 1976. He has published more than 140 scientific papers and received the Ewing Medal from the USN and the American Geophysical Union (AGU) in 1994. He received the Newcomb-Cleveland Prize from the AAAS in 1983 for a paper in Science. His research interests are the shallow and deep structure of the ocean basins and ridges, the use of seismic data for monitoring nuclear explosions, and the exploitation of information technology for the collection and processing of real-time environmental data. He is President of the American Geophysical Union (AGU) and a member of the Science Advisory Panel to the President’s Ocean Policy Commission. He was elected to the American Philosophical Society in 2002; the APS was founded by Benjamin Franklin in 1743. </presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_18:43:35</time_stamp>
    <status>active</status>
    <sub_id>scgm110</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>31</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>11:15AM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>Optical Networking and Cyberinfrastructure</sess_title>
    <sess_chair>Gwendolyn Huntoon </sess_chair>
    <title>The OptIPuter: Experimental Hybrid Network Structure and Emerging Software Services for Lambda-enabled Computers</title>
    <all_auth_inst>Philip Papadopoulos (UC San Diego/San Diego Supercomputer Center)</all_auth_inst>
    <abs>Dedicated optical networks to support e-Science are coming of age. Wave Division Multiplexing (WDM) is dramatically increasing the aggregate carrying capacity of optical fibers and 10 Gigabit networks are plummeting in price. Scientists can now affordably connect their laboratory clusters to private high-bandwidth light pipes (termed  &quot;lambdas&quot;) to form LambdaGrids.  The OptIPuter is a five year NSF-funded research project that is exploring how this technology disruption fundamentally changes distributed systems design.  Key research issues include determining how and if  lambda circuits should be exposed to applications, how one controls light circuits, and what protocol enhancements are relevant. Working closely with biomedical and geoscience applications, we are prototyping campus-scale systems in California and Illinois to understand first-order effects. This talk describes  these in detail and highlights key research software in the areas of non-TCP protocols, dynamic light path management, virtual machine construction, and real machine management.  For more details see  &lt;a href=&quot;http://www.optiputer.net&quot; target=&quot;w2&quot;&gt;www.optiputer.net&lt;/a&gt;</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Philip Papadopoulos received his Ph.D. in electrical and computer engineering from the University of California, Santa Barbara. His focus was on scalable numerical methods for matrix-valued equations in control. In 1993, Dr. Papadopoulos moved to Oak Ridge National Laboratory and was a member of the Parallel Virtual Machine (PVM) design and implementation team. In 1998, Philip joined the computer science department at UC San Diego working with Prof. Andrew Chien on high-performance clusters based on the Windows NT operating system. In 1999, Dr. Papadopoulos joined the San Diego Supercomputer Center to lead their Linux cluster development group. Dr. Papadopoulos is currently the Program Director for Grid and Cluster Computing at the San Diego Supercomputer Center.</presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scgm106</sub_id>
    <event_type>Masterworks</event_type>
    <sess_id>28</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>11:15AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>303-305</sess_room>
    <sess_title>Homeland Security</sess_title>
    <sess_chair>Harvey J. Wasserman (Los Alamos National Laboratory)</sess_chair>
    <title>Visual Analytics: From Science to National Security</title>
    <all_auth_inst>Jim Thomas (Pacific Northwest National Laboratory)</all_auth_inst>
    <abs>We are in the midst of a major change in the requirements for visualization technologies for the future.  These requirements will drive a new research and development agenda for aspects of both scientific and information visualization.  The history of visualization has been dominated by the invention and effective delivery of technology and products around visual metaphors—usually with a single metaphor such flow fields, a tree or spherical visual per product.  The recent focus on devices such as caves, walls, tables, tablets and PDAs has offered new physical forms of display. However, people using these metaphors and devices are simply drowning in information and require a fundamentally new approach to visualization and interaction.  The interaction, whether in the area of genomics, finance, communications, or homeland security, must be driven by a discourse for discovery and engaged learning.  I will use the term Visual Analytics to represent a vision for suites of technologies that provide a new approach to interaction and allow a discourse for discovery.  Early technology and products are being used today having a real impact in national security.  Learning from those usages and deployments, I will present the driving new characteristics of interaction and suggest the top technical challenges for visual analytics, enlisting comments and recommendations. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Jim Thomas is director of the Department of Homeland Security’s National Visual Analytics Center and a Laboratory Fellow at Pacific Northwest National Laboratory. Thomas specializes in the research, design and implementation of innovative information and scientific visualization, multimedia and human computer interaction technology. At PNNL, he has established investment directions for information technology, led major technology initiatives, mentored staff and spearheaded several major research programs. Thomas is internationally recognized for his contributions to the field of information visualization. He is a member of the Association for Computing Machinery and Institute of Electrical and Electronics Engineers. Thomas also was chair of ACM SIGGRAPH, former editor-in-chief for IEEE Computer Graphics and Applications, and has chaired graphics and visualization conferences for both ACM and IEEE. He is on three editorial boards for journals and has presented more than 20 keynote talks at major conferences.</presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_11:42:12</time_stamp>
    <status>active</status>
    <sub_id>scg121</sub_id>
    <event_type>SC Global Showcase</event_type>
    <sess_id>57</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>2:15PM</begin_time>
    <end_time>3:00PM</end_time>
    <sess_room>403-405</sess_room>
    <sess_title>Collaborative Tools</sess_title>
    <sess_chair></sess_chair>
    <title>WestGrid Collaboration and Visualization Network</title>
    <all_auth_inst>Brian Corrie (Simon Fraser University)</all_auth_inst>
    <abs>WestGrid is a multi-province, multi-institute grid-computing project in Western Canada, deploying a range of grid enabled computation and storage facilities across Alberta and British Columbia. As part of this initiative, the Collaboration and Visualization (CV) Group within WestGrid has also deployed an advanced set of collaboration and visualization technologies across the WestGrid sites. The goal of the CV group is to provide an advanced collaboration and visualization infrastructure to the computational scientists in the WestGrid community. The collaboration infrastructure uses AccessGrid as a foundation for providing collaboration capabilities, including collaborative visualization, to its users. In this talk I will describe the WestGrid infrastructure that has currently been deployed, as well as describe some of our goals for integrating visualization into the AG environment as a  first class  collaboration service. I will pose some open questions as to how best to approach this integration in the AG environment.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_16:30:08</time_stamp>
    <status>active</status>
    <sub_id>scg122</sub_id>
    <event_type>SC Global Showcase</event_type>
    <sess_id>60</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>4:00PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>403-405</sess_room>
    <sess_title>SC2004 Technology</sess_title>
    <sess_chair></sess_chair>
    <title>InfoStar - Real Time Access to SuperComputing Conference Information</title>
    <all_auth_inst>Ken Washington (StorCloud)</all_auth_inst>
    <abs>High Performance Storage highlights the second SC2004 special initiative known as StorCloud.  Not more than several decades ago, the state of the industry focused on processor technology as a means of pushing the HPC envelop.  Today we recognize that a truly operable environment is one that considers high performance computing storage capability as a critical and essential part of any HPC solution.  The StorCloud initiative focuses on working collaboratively with leading technology vendors, government labs, and academic institutions to showcase the next generation in high performance storage technologies coupled with high bandwidth applications during the conference.

InfoStar is a new SC special initiative whose goals are to: (1) provide real-time information about multiple aspects of the conference to all participants, and (2) create a searchable knowledge base about conference events and attendance for the benefit of future SC conference planners.

Both the InfoStar and StorCloud initiatives will combine leading resources, communications technologies, and control/management software creating a bridge to success for the SC2004 and future SC conference exhibitors and attendees. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_16:47:39</time_stamp>
    <status>active</status>
    <sub_id>scg119</sub_id>
    <event_type>SC Global Showcase</event_type>
    <sess_id>83</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>11:15AM</end_time>
    <sess_room>403-405</sess_room>
    <sess_title>Expanding Uses of AG</sess_title>
    <sess_chair></sess_chair>
    <title>The AccessGrid and its applications across an Australian University</title>
    <all_auth_inst>Jason Bell (Australia)</all_auth_inst>
    <abs>This presentation will focus on the use of AccessGrid&amp;apos;s within Central Queensland University, Australia.  

Central Queensland University is a diverse organisation with campuses located many hundreds and thousands of kilometres apart.  Within the University, AccessGrids are used to facilitate collaboration between researchers both across the campuses and within other institutions.

This presentation will demonstrate how AccessGrids are being used to break down the geographical barriers associated with a regional University.  The current usage of the AccessGrids will be described (compared to H323 Videoconferencing) and the potential future applications of the AccessGrids will be discussed. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_11:53:12</time_stamp>
    <status>active</status>
    <sub_id>scg118</sub_id>
    <event_type>SC Global Showcase</event_type>
    <sess_id>14</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>2:10PM</begin_time>
    <end_time>2:55PM</end_time>
    <sess_room>403-405</sess_room>
    <sess_title>Low &amp; High Bandwidth Environments</sess_title>
    <sess_chair></sess_chair>
    <title>LEAD:  Linked Environments for Atmospheric Discovery</title>
    <all_auth_inst>Jay Alameda (NCSA), Kelvin Droegemeier (University of Oklahoma)</all_auth_inst>
    <abs>The ability to understand, forecast, and mitigate the impacts of severe and damaging weather is stifled by rigid information technology frameworks that cannot accommodate the real time, on-demand, and dynamically-adaptive needs of mesoscale weather research; its disparate, high volume data sets and streams; and its tremendous computational demands.  In response to this pressing need for a comprehensive national cyberinfrastructure in mesoscale meteorology, particularly one that can interoperate with those being developed in other cognate disciplines, the Linked Environments for Atmospheric Discovery (LEAD) project has been recently funded to facilitate the identification, access, preparation, assimilation, prediction, management, analysis, mining, and visualization of a broad array of meteorological data and model output, independent of format and physical location.  A transforming element of LEAD is dynamic workflow orchestration and data management, which will allow use of analysis tools, forecast models, and data repositories as dynamically adaptive, on-demand systems that can a) change configuration rapidly and automatically in response to weather; b) continually be steered by new data; c) respond to decision-driven inputs from users; d) initiate other processes automatically; and e) steer remote observing technologies to optimize data collection for the problem at hand. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scg109</sub_id>
    <event_type>SC Global Showcase</event_type>
    <sess_id>15</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>10:30AM</begin_time>
    <end_time>12:00PM</end_time>
    <sess_room>403-405</sess_room>
    <sess_title>MSI Consortium Panel</sess_title>
    <sess_chair></sess_chair>
    <title>The Minority Serving Institution Consortium: Engaging Cyberinfrastructure to strengthen and add value to MSI institutions </title>
    <all_auth_inst>Amado Gonzalez (MSIC - FIU Engineering - HASTAC), Stephenie Mclean (MSIC - NCSA - ACCESS DC - EOT PACI), Alson Been (MSIC - Bethune Cookman College), Tiki Suarez (MSIC - Florida Agricultural and Mechanical University), Johnnie Spraggins (MSIC - Our Lady of the Lake University), Scott Lathrop (MSIC - NCSA - University of Illinios Urbana Champaign - EOT PACI), Jennifer Teig Von Hoffman (MSIC -  Boston University - EOT PACI)</all_auth_inst>
    <abs>Collaboration in Access Grid, cyberinfrasctructure, and HPC activities for MSI'c is our main focus; the topics to be discussed are, 

 What challenges are MSI institutions facing when collaborating and working towards multi-disciplinary and multi-institutional advanced networking projects? 

What did it take for MSI’s to have an infrastructure in place to support Access Grid activities? 

What does it take to get the administration involved with CyberInfrastructure efforts? 

How can MSI’s become a driver for CyberInfrastructure? 

What have been some of the social and technical barriers at MSI’s and CI? 

How can the MSI Consortium serve as a vehicle to showcase research, resources, and advanced collaboration to the broader community? 

How can AG tools and systems strengthen and add value to institutions? 

How can your institution become part of the MSIC’s collaborative effort to engage Cyberinfrastructure?  </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scg124</sub_id>
    <event_type>SC Global Showcase</event_type>
    <sess_id>58</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>4:40PM</begin_time>
    <end_time>4:55PM</end_time>
    <sess_room>403-405</sess_room>
    <sess_title>Artistic/Cultural Applications</sess_title>
    <sess_chair></sess_chair>
    <title>Karaoke</title>
    <all_auth_inst>Kazuyuki Shudo (AIST Japan)</all_auth_inst>
    <abs>Karaoke Grid is assembled technologies configured to produce virtual karaoke rooms on the Grid. We, AIST, Waseda University and XING could demonstrate it very successfully in the Karaoke session as the last part of SC Global 2003. In this year, Karaoke Grid node will be improved to have an enhanced version of EffecTV, which applies many kinds of video effects to input video streams. Furthermore, the video and audio software VIC and RAT is possibly replaced by a compatible software with less audio latency.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scg105</sub_id>
    <event_type>SC Global Showcase</event_type>
    <sess_id>14</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>2:10PM</end_time>
    <sess_room>403-405</sess_room>
    <sess_title>Low &amp; High Bandwidth Environments</sess_title>
    <sess_chair></sess_chair>
    <title>Advancing Technology in Native American Communities</title>
    <all_auth_inst>Maria P. Williams (Tribal Virtual Network), Lee Q. Derks (Tribal Virtual Network), Kevin Shendo (Pueblo of Jemez), Travis Suazo (Indian Pueblo Cultural Center), Lorene Willis (Jicarilla Apache Culture Center), Tom Kennedy (Pueblo of Zuni), Vernon Lujan (Pojoaque Poeh Arts Center)</all_auth_inst>
    <abs>We will be discussing how the Tribal Virtual Network allows remote Native American communities in New Mexico to participate in the Access Grid community through a T-1 network. We will also discuss the challenges we have faced throughout the past year and how we have overcome these obstacles.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_11:28:41</time_stamp>
    <status>active</status>
    <sub_id>scg113</sub_id>
    <event_type>SC Global Showcase</event_type>
    <sess_id>58</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>3:30PM</begin_time>
    <end_time>4:05PM</end_time>
    <sess_room>403-405</sess_room>
    <sess_title>Artistic/Cultural Applications</sess_title>
    <sess_chair></sess_chair>
    <title>The MARCEL Network, a Global Faculty in Art &amp; Science</title>
    <all_auth_inst>Don Foresta (Salford), Mathias Fuchs (Salford)</all_auth_inst>
    <abs>MARCEL is a permanent high band-width network dedicated to artistic, educational and cultural experimentation, exchange between art and science and collaboration between art and industry. www.mmmarcel.org

Global Threads is a virtual faculty in art and science using the MARCEL network bringing artists and science to members to present topics in tandem over Access Grid. www.souillac.org/projects/GlobalThreads.html

We will present a prototype of the Global Threads presentations. www.alterne.info/artistic_prototypes.html#game </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-19_11:41:16</time_stamp>
    <status>active</status>
    <sub_id>scg112</sub_id>
    <event_type>SC Global Showcase</event_type>
    <sess_id>61</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>4:20PM</begin_time>
    <end_time>5:00PM</end_time>
    <sess_room>403-405</sess_room>
    <sess_title>Virtual Reality</sess_title>
    <sess_chair></sess_chair>
    <title>ImmersaView Launcher: A Shared AG Application</title>
    <all_auth_inst>Allan Spale (University of Illinois at Chicago)</all_auth_inst>
    <abs>ImmersaView is a collaborative, stereo-display capable VRML2 and Open Inventor model viewer in which participants can adjust the view of the model and have this view propagated to the other remote sites.  In order to integrate ImmersaView into the Access Grid (AG), a shared application named ImmersaView Launcher was created that provides a GUI for configuring, starting, and stopping ImmersaView both locally and for all participants in the shared application session without changing any of its code.  ImmersaView Launcher utilizes a lightweight XML-RPC server that performs the requested actions given by the GUI.  By separating out the functionality of the GUI from the shared application, the XML-RPC server can be moved to any PC where the ImmersaView application resides.  This allows ImmersaView to utilize a separate PC from the AG node that may be connected to a special display device, such as a stereo projector like the GeoWall (http://www.geowall.org).  This session will explore the use of ImmersaView Launcher as a shared AG application and discuss a general design for utilizing an XML-RPC server that can integrate other “closed” applications as shared AG applications.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scg106</sub_id>
    <event_type>SC Global Showcase</event_type>
    <sess_id>57</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>1:30PM</begin_time>
    <end_time>2:15PM</end_time>
    <sess_room>403-405</sess_room>
    <sess_title>Collaborative Tools</sess_title>
    <sess_chair></sess_chair>
    <title>Collaborative Finite Element Analysis</title>
    <all_auth_inst>Lee Margetts (University of Manchester), Simon Thomas Bee (University of Salford)</all_auth_inst>
    <abs>We will present our virtual reality software infrastructure viSpaces. We will show how this can be configured to accomodate immmersive/desktop collaboration. Additionally, we will demonstrate how this technology can be integrated with the existing access grid infrastructure to enrich the collaboration.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_14:31:37</time_stamp>
    <status>active</status>
    <sub_id>scg117</sub_id>
    <event_type>SC Global Showcase</event_type>
    <sess_id>60</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>3:30PM</begin_time>
    <end_time>4:00PM</end_time>
    <sess_room>403-405</sess_room>
    <sess_title>SC2004 Technology</sess_title>
    <sess_chair></sess_chair>
    <title>SciNet - Planning, Building, and Executing</title>
    <all_auth_inst>Chuck Fisher (SCInet), Tom Hutton (SCInet)</all_auth_inst>
    <abs>SCinet is the high-performance network built to support the annual SCxy Conference series.  Volunteers from educational institutions, high performance computing sites, network OEMs, research networks, and telecommunications carriers work together to design and deliver the SCinet networks.  This session discusses how the network is planned and built, as well as the impact it has for participants and the industry.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_16:47:09</time_stamp>
    <status>active</status>
    <sub_id>scg111</sub_id>
    <event_type>SC Global Showcase</event_type>
    <sess_id>61</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>3:30PM</begin_time>
    <end_time>4:20PM</end_time>
    <sess_room>403-405</sess_room>
    <sess_title>Virtual Reality</sess_title>
    <sess_chair></sess_chair>
    <title>Stereographics and Virtual Reality over the Access Grid</title>
    <all_auth_inst>Dioselin Gonzalez (Purdue University), Laura Arns (Purdue University), John Moreland (Purdue University)</all_auth_inst>
    <abs>We present and demonstrate two different methods for sharing stereoscopic (stereo) images via Access Grid (AG). The first is a method for creating stereo movies that can be viewed using the AG2 shared movie player. These movies can then be displayed with passive stereo methods using a PC with dual output video card, such as a typical “Geowall” setup [www.geowall.org]. The second is an ongoing project to create a toolkit for enabling collaborative virtual reality over the AG, by extending the VRJuggler toolkit [www.vrjuggler.org]. This set of libraries will allow virtual reality applications to run in geographically distant AG nodes. Node hardware setups may vary for each site, from fully immersive CAVE™ –like setups with tracking, to PCs equipped with active stereo ability or even laptops with no stereo capability. Collaborators will share virtual worlds, but will retain their ability to interact individually with the virtual world and each other. Both of these methods make use of widely available and inexpensive hardware and software, allowing many AG users to potentially take advantage of them. 

We invite other sites to participate in our demos. Software download and installation instructions will be located at www.envision.purdue.edu/papers/SC04-stereo-AG/ in advance of the conference. 

</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_16:47:39</time_stamp>
    <status>active</status>
    <sub_id>scg115</sub_id>
    <event_type>SC Global Showcase</event_type>
    <sess_id>83</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time></begin_time>
    <end_time></end_time>
    <sess_room>403-405</sess_room>
    <sess_title>Expanding Uses of AG</sess_title>
    <sess_chair></sess_chair>
    <title>TeraVision: High bandwidth, collaborative video streaming on the Access Grid</title>
    <all_auth_inst>Arun Rao (University of Illinois at Chicago)</all_auth_inst>
    <abs>TeraVision can be envisioned as a hardware-assisted, network-enabled “PowerPoint” projector for distributing and displaying scientific visualizations. A user who wants to stream visualization simply plugs the VGA or DVI output of the source computer into a TeraVision Box (also called a VBox) for transmitting it to displays across the network. Figure 1 below, depicts the system’s capability of taking video inputs from a wide range of video sources and streaming them to a wide array of display technologies over high-speed networks. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-09_14:32:00</time_stamp>
    <status>active</status>
    <sub_id>scg114</sub_id>
    <event_type>SC Global Showcase</event_type>
    <sess_id>58</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>4:05PM</begin_time>
    <end_time>4:40PM</end_time>
    <sess_room>403-405</sess_room>
    <sess_title>Artistic/Cultural Applications</sess_title>
    <sess_chair></sess_chair>
    <title>InterPlay: Hallucinations</title>
    <all_auth_inst>Jimmy Miklavcic (University of Utah)</all_auth_inst>
    <abs>We will be using several ATI Radeon All-in-Wonder 9600 video cards with S-video outputs. We feed the NTSC video from those cards into a SIMA digital video mixer. The incoming video and the local video is combined and sent into the AG video capture cards. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio></presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:16:56</time_stamp>
    <status>active</status>
    <sub_id>spkr101</sub_id>
    <event_type>Invited Speaker</event_type>
    <sess_id>85</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>9:15AM</end_time>
    <sess_room>Ballroom B-C</sess_room>
    <sess_title>High Performance Computing in Context</sess_title>
    <sess_chair>Charles J. Holland </sess_chair>
    <title>High Performance Computing in Context</title>
    <all_auth_inst>Charles J. Holland (Deputy Under Secretary of Defense for Science and Technology)</all_auth_inst>
    <abs>High performance computing has been, is, and will continue to be a strategic capability for the Nation.  Dating back to the 1982 Lax Report, this field has been studied and restudied numerous times in an effort to advance national goals in science, engineering, and national security. Typically and by necessity, such analyses tend to be introspective in nature and narrowly focused.  This talk examines high performance computing in a broader context, discussing the &quot;spheres of influence&quot; that are expected to shape the future of this field.  I will look at these influences from three different perspectives.  The first involves the historical, examining the impact of past and present programs, trends, and events.  The second examines the landscape of information and electronics technology, from embedded systems to security, to highlight those areas that are expected to heavily influence thinking and design of tomorrow's high performance computing systems and software technologies.  The third is from one of Federal policy, highlighting the forces and trends that will shape government activities in this field.  </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Dr. Charles J. Holland is the Deputy Under Secretary of Defense (Science and Technology).   In this capacity, he provides leadership to the entire spectrum of the $10B+ annual Defense science and technology portfolio executed through the three Services and Defense Agencies.   In addition, he oversees the DoD High Performance Computing Modernization Program, the Defense Modeling and Simulation Office and the Software Engineering Institute, which provide corporate Department of Defense capabilities.  His office is also responsible for validating the Technology Readiness Assessment of all major DoD programs requiring Defense Acquisition Board decisions.  He is the US Principal to The Technical Cooperation Program, a cooperative defense science and technology activity with representatives from Australia, Canada, New Zealand, United Kingdom and the United States. 

Over the past two decades, Dr. Holland has played a key role in the Federal high performance computing R&amp;D, dating back to the 1987 OSTP report A R&amp;D Strategy for High Performance Computing.  Recently, he served as co-lead for the Report on High Performance Computing for the National Security Community (July 2002) and led the development of the white paper DoD Research and Development Agenda for High Productivity Computing Systems (June 2001), which served as the roadmap for the current DARPA program in high-end computing.    Also, he has been actively involved with the DoD High Performance Computing Modernization Program.  Since its inception in 1992, he has served on the program's advisory panel and twice served as the Program Director.  

Dr. Holland began his career in academia with appointments at Purdue University and as a Visiting Member at the Courant Institute of Mathematical Sciences at New York University.  A substantial portion of his career has involved the direction of basic research programs in computer science and applied mathematics at the Office of Naval Research and the Air Force Office of Scientific Research. 

He received the Presidential Rank Award, Meritorious Executive (2000), the Society for Industrial and Applied Mathematics Commendation for Public Service Award (1999), and the Meritorious Civilian Service Award from the Secretary of Defense (2001), Air Force (1998), and the Navy (1984).  He is a member of the Board of Trustees for the Consortium for Mathematics and its Applications (COMAP) and the Editorial Board of Computing In Science and Engineering.

He received a B.S. (1968), M.S. (1969) in Applied Mathematics from the Georgia Institute of Technology and a Ph. D. (1972) in Applied Mathematics from Brown University. </presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:18:03</time_stamp>
    <status>active</status>
    <sub_id>spkr102</sub_id>
    <event_type>Invited Speaker</event_type>
    <sess_id>86</sess_id>
    <sess_date>Wednesday, November 10</sess_date>
    <begin_time>9:15AM</begin_time>
    <end_time>10:00AM</end_time>
    <sess_room>Ballroom B-C</sess_room>
    <sess_title>Toward a High Performance Computing Economy</sess_title>
    <sess_chair>Stan Ahalt </sess_chair>
    <title>Toward a High Performance Computing Economy </title>
    <all_auth_inst>Stan Ahalt (Executive Director of the Ohio Supercomputer Center)</all_auth_inst>
    <abs>Economic forces will continue to shape High Performance Computing (HPC).   It is clear that the US has reached a critical juncture with regard to HPC, and the central challenge will be to sustain sources of funding so that our leadership position in HPC is sustained.  However, another view of this challenge might be more illuminating: can HPC be realistically viewed as one of the critical economic drivers for our future?  Is this view of HPC realistic?  Is it realizable?  And if HPC is one of a relatively small number of critical economic differentiators, what level of national investment in HPC is justified by its economic potential?

I will try to answer these questions through field interviews, anecdotes, and statistics on the current state of HPC.  I argue that a fundamental addition to the HPC market  - ¬ the addition of “blue collar” computing ¬ - needs to take place in order to revitalize U.S leadership in computational science, engineering and product design.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Dr. Stanley C. Ahalt was appointed Executive Director of the Ohio Supercomputer Center (OSC) on July 1, 2003.   OSC provides reliable high performance computing and networking infrastructure for a diverse statewide regional community that includes education, academic research, industry, and state government.  Dr. Ahalt is also the academic lead for Signal and Image Processing (SIP) in the Department of Defense (DoD) High Performance Computing Modernization Office's Programming Environment and Training (HPCMO PET) initiative.  He is also a participant in the DARPA HPCS program.  His research expertise lies in signal processing, data compression, and neural networks, specifically in the use of high performance computing for these applications. Dr. Ahalt has published more than 100 archival journal articles, conference papers, and book chapters, and has served as an Associate Editor for the IEEE Transactions on Neural Networks.  He is a member of the Council on Competitiveness HPC Advisory Committee.

Dr. Ahalt has been a Professor in the Department of Electrical and Computer Engineering at The Ohio State University (OSU).  He received the 1997 OSU Lumley Research Award and the 1999 OSU College of Engineering Research Award.

Prior to joining OSU, Dr. Ahalt worked at Bell Telephone Laboratories where he developed industrial data products.  He received his BS and MS degrees in electrical engineering from Virginia Tech and his Ph.D. in electrical and computer engineering from Clemson University in 1986, and was elected to Tau Beta Pi and Eta Kappa Nu.</presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:21:05</time_stamp>
    <status>active</status>
    <sub_id>spkr103</sub_id>
    <event_type>Invited Speaker</event_type>
    <sess_id>84</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>8:30AM</begin_time>
    <end_time>9:15AM</end_time>
    <sess_room>Ballroom B-C</sess_room>
    <sess_title>Computing Opportunities in the Era of Abundant Biological Data</sess_title>
    <sess_chair>Gane Ka-Shu Wong </sess_chair>
    <title>Computing Opportunities in the Era of Abundant Biological Data</title>
    <all_auth_inst>Gane Ka-Shu Wong (Associate Director of Beijing Institute of Genomics)</all_auth_inst>
    <abs>The completion of the sequence of the human genome was announced with great fanfare in the summer of 2000, with representatives from the public sector (Francis Collins of the NIH) and the private sector (Craig Venter of Celera) meeting at the White House to join hands in their brief moment of glory. At the time, it was asserted that this sequence would revolutionize the practice of medicine forever. Since those heady days, the stock market bubble has burst, and Venter has resigned from Celera. Was it all an illusion? One should hope not, because corporations like IBM, Oracle, and GE have all launched major new initiatives in life sciences. Genome sequence continues to accumulate in the databases at growth rates surpassing Moore’s Law. More importantly, genome sequence is only the beginning. The output of the genome is a set of biomolecules (RNA and protein) that turn on and off under different conditions, that interact with each other, that interact with the environment, and ultimately produce the miracle known as life. A bewildering assortment of biological information is being generated, to record this molecular symphony, under all possible environmental and physiological conditions, leading to all possible outcomes. Their ultimate goal is to reverse engineer this system, by building mathematical models with as much detail as necessary. As a result, there is a monumental culture change going on in biology, which is becoming an increasingly quantitative science, driving the need for more computing power. I will show examples of where high power computing is used in biology, past present and future. Then, to put it all in perspective, I will discuss how genomes evolve. The irony of the situation is that, despite all this computing power being brought to bear on the problem, the best model for genome evolution is the haphazard development of the much-maligned Microsoft OS. What they share is that both processes were driven by immediate Darwinian needs, with little foresight or planning, and we are all left to deal with the consequences.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Gane Ka-Shu Wong is an Associate Director for the Beijing Institute of Genomics (BIG), one of the largest biological research facilities in Asia, best known for their work on the sequencing of the genomes for human, rice (indica and japonica), silkworm, and chicken. He is responsible for all of the major computational analyses performed at BIG, as well as the overall scientific direction and publications in high profile journals like Nature and Science. Eleven years ago, he was recruited into the Human Genome Project (HGP) by Maynard Olson, one of the founders of the HGP. In concert with Phil Green and Jun Yu, these four scientists launched the University of Washington Genome Center, which set new standards for data quality at the time, and developed many of the key software tools used by the HGP. In his previous life, before he moved to UW in Seattle, and made the switch into biology, he was an experimental low temperature physicist with a Ph.D. from Cornell University. But, even before that, he was an engineer with a B.A.Sc. from the University of British Columbia, with quadruple honors in physics, electrical engineering, mathematics, and computer science.</presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_09:19:19</time_stamp>
    <status>active</status>
    <sub_id>spkr104</sub_id>
    <event_type>Invited Speaker</event_type>
    <sess_id>87</sess_id>
    <sess_date>Thursday, November 11</sess_date>
    <begin_time>9:15AM</begin_time>
    <end_time>10:00AM</end_time>
    <sess_room>Ballroom B-C</sess_room>
    <sess_title>Computing - An Intellectual Lever for Multidisciplinary Discovery</sess_title>
    <sess_chair>Daniel A.  Reed </sess_chair>
    <title>Computing - An Intellectual Lever for Multidisciplinary Discovery</title>
    <all_auth_inst>Daniel A. Reed (Director, Renaissance Computing Institute Duke, UNC and NCSU)</all_auth_inst>
    <abs>Legend says that Archimedes remarked, on the discovery of the lever, “Give me a place to stand, and I can move the world.” Today, computing pervades all aspects of science and engineering. “Science” and “computational science” have become largely synonymous, and high-performance computing is the intellectual lever that opens the pathway to discovery. 

One aspect of high-performance computing distinguishes it from other scientific instruments – its universality as an intellectual amplifier.  Powerful new telescopes advance astronomy, but not materials science.  Powerful new particle accelerators advance high energy physics, but not genetics. In contrast, high-performance computing advances all of science and engineering, because all disciplines benefit from high-resolution model predictions, theoretical validations and experimental data analysis.

As new scientific discoveries increasingly lie at the interstices of traditional disciplines, computing is also the enabler for a scholarship in the arts, humanities, creative practice and public policy. The term Renaissance Computing, coined by my friend and colleague Donna Cox, is intended to both evoke and capture the breadth of such intellectual activities enriching and empowering human potential, as well as creating intellectual communities that span the sciences and engineering, the arts, the humanities and commerce. This talk will describe emerging opportunities in the arts, humanities, science and engineering where interdisciplinary Renaissance approaches can have profound impact on discovery and creative expression. </abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Professor Daniel A. Reed is Vice-Chancellor for Information Technology and Chief Information Officer for the University of North Carolina at Chapel Hill.  He is also Director of the Renaissance Computing Institute (RENCI), an interdisciplinary center spanning the University of North Carolina at Chapel Hill, Duke University and North Carolina State University.  RENCI is exploring the interactions of computing technology with the sciences, arts and humanities. A &quot;Renaissance team&quot; approach is bringing scientists, engineers, artists and institute staff together to explore interdisciplinary approaches to scholarship, discovery and education. The institute is also partnering with business leaders to enhance the competitiveness of North Carolina industries. He holds the Chancellor’s Eminent Professorship at the University of North Carolina at Chapel Hill, where he conducts interdisciplinary research in high-performance computing.

Dr. Reed was previously Director of the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, where he also led National Computational Science Alliance, a consortium of roughly fifty academic institutions and national laboratories that is developing next-generation software infrastructure of scientific computing.  He was also one of the principal investigators and the chief architect for the NSF TeraGrid. Professor Reed is also the former head of the Department of Computer Science at the University of Illinois, spearheaded more than $100 million in construction to create a new information technology quadrangle on the Illinois campus. 

Dr. Reed is a member of President George W. Bush's Information Technology Advisory Committee, charged with providing advice on information technology issues and challenges to the president, a member of the Biomedical Informatics Expert Panel for the National Institute of Health's National Center for Research Resources and serves on the board of directors of the Computing Research Association, which represents the interests of the major academic departments and industrial research laboratories.</presenter_bio>
    <file_name></file_name>
  </record>
  <record>
    <time_stamp>2004-07-20_19:42:25</time_stamp>
    <status>active</status>
    <sub_id>key101</sub_id>
    <event_type>Keynote</event_type>
    <sess_id>88</sess_id>
    <sess_date>Tuesday, November 9</sess_date>
    <begin_time>9:00AM</begin_time>
    <end_time>10:00AM</end_time>
    <sess_room>Ballroom A-B-C</sess_room>
    <sess_title>Keynote Talk</sess_title>
    <sess_chair>Tom West </sess_chair>
    <title>NLR: Providing the Nationwide Network Infrastructure for Network and &quot;Big Science&quot; Research</title>
    <all_auth_inst>Tom West (President and CEO of the National LambdaRail)</all_auth_inst>
    <abs>The NLR infrastructure is designed to foster the concurrent advancement of networking research and next generation network-based applications in science, engineering and medicine. NLR aims to reenergize innovative research and development into next generation network technologies, protocols, services and applications.

NLR is a very networking initiative of the research community following in the footsteps of ARPANET, NSFnet and Internet2. NLR strives to again stimulate and support innovative network research to go above and beyond the current incremental evolution of the Internet. 

There are three major characteristics that distinguish NLR from the predecessor initiatives. First, NLR is self-funded by the investment of a consortium of leading U.S. research universities and Cisco Systems. There were no direct federal funds underwriting the foundation costs. We anticipate federal funding going to the researchers as part of grants to enable the uses of NLR. 

Second, NLR is not a single network, but instead, a unique and rich set of facilities, capabilities and services that will support a set of multiple, distinct, experimental and production networks for the U.S. research community. On NLR, these many different networks will exist side-by-side in the same fiber optic cable pair, but will be physically and operationally independent of each other as each network will be supported by its own lightwave or 'lambda.' 

Third, it is the first time a nationwide network infrastructure is OWNED by the research and education community. As a result, our community will have the control and flexibility over network architecture, deployment and and uses. NLR is being deployed to be responsive to the specific networking requirements of various scientific research endeavors and to network research intended to advance our understanding of network technologies and protocols.

The talk will endeavor to highlight just how NLR came into being, what it is comprised of, and how it is being initially used.</abs>
    <awards></awards>
    <intro_level></intro_level>
    <inter_level></inter_level>
    <adv_level></adv_level>
    <presenter_bio>Tom West became the Chief Executive Officer of NLR in September 2003. NLR is a national effort comprised of members and associates from across the country focused on implementing and operating a national network infrastructure to serve the needs of the advanced research community.

West has over four decades of executive management experience in the research and higher education community.  He has served as a small college president, a vice chancellor for administration for regional campuses in a public university system, and 26 years as the Chief Information Technology Officer (CITO) for two large public university systems-Indiana University (1973-1981) and the California State University (1981-1999).

From March 1999 through June 2004 he served as the President and Chief Executive Officer for CENIC (Corporation for Education Network Initiatives in California). He served as CEO for both CENIC and NLR from September 2003 through June 2004.  At the end of June he resigned from CENIC to devote all his time to NLR.</presenter_bio>
    <file_name></file_name>
  </record>
</document>
