Architectural Simulation to Accelerate CoDesign

The architectural simulation of computing systems plays an essential role in providing performance and power predictive capability because it enables the rapid quantitative exploration of HPC system design tradeoffs.   Three tools which work synergistically together to provide a complete codesign simulation tool set are the ROSE compiler, the ACE emulation platform, and the SST simulator.

The codesign process uses a hierarchy of simplified surrogate code representations to provide hardware designers with actionable detailed information while still ensuring that the context for any insight remains faithful to the full application’s requirements.

The SST simulator consists of two tools: SST/macro and SST/micro. SST/macro is a coarse-grained simulator which lets designers study large-scale systems in a way that captures the complex interactions among hardware components. SST/macro can use a skeleton application that domain experts provide or ROSE-based analysis tools generate which scales and behaves exactly like the original application code, enabling investigating the communication characteristics and bottlenecks of applications that may arise only at the scales which are predicted for next-generation machines. Designers can also replay traces of a previously run MPI application through the simulator, allowing its execution time to be estimated on new hardware or validating the simulator against existing hardware.  SST/micro is a general simulation framework which can be used to compose complete simulations of HPC compute nodes by combining cycle-accurate models of processors, memory, disks, and network routers to explore detailed node-level architectural design tradeoffs in both hardware and software.

The ACE emulation platform extends tools like the Tensilica Xtensa Processor Generator (XPG) tool chain to work as our rapid prototyping platform for node-level HPC emulation. The XPG’s customizable instruction set, communication interfaces, and memory hierarchy make it ideal for exploring novel chip multi- processor designs, and its ability to extend the instruction set to add application-specific functionality produces a streamlined processor with scratchpad memories, advanced communication features, and custom operational codes that facilitate advanced communication and synchronization. Also, XPG’s ability to automatically generate C/C++ compilers, debuggers, and functional models enables fast software porting and rapid testing with a new architecture.
The ROSE infrastructure enables automating the extraction of skeleton applications to support input to the simulators. ROSE analyzes and transforms an internal representation of source code to cut away parts of the application, reducing complexity but preserving specific features such as message-passing interface (MPI) communication. This transformed code becomes the skeleton application that runs in the simulation environments.

For more information, please contact John Shalf of Lawrence Berkeley National Laboratory at



ExaCT is funded by the DoE office of Advanced Scientific Computing Research (ASCR). Dr. Karen Pao is the program manager and Dr. William Harrod is the director of the ASCR Research Division. U.S. Department of Energy: Office of Science Stanford University The University of Utah Georgia Institute of Technology Lawrence Berkeley National Laboratory Lawrence Livermore National Laboratory Oak Ridge National Laboratory The University of Texas at Austin Rutgers: The State University of New Jersey National Renewable Energy Laboratory (NREL) Los Alamos National Laboratory Sandia National Laboratories