PushPin ISCA-33 Tutorial Call for Participation:

Using the Pin Instrumentation Tool for Computer Architecture Research

Saturday, June 17, 2006

Boston, Massachusetts

Slides for the talks available HERE

Time

Talks

1:30 – 2:15

“Introduction to Pin” by Chi-Keung Luk (CK)

2:15 – 2:45

“Micro-architecture Studies Using Pin: Branch Predictors, Caches, & Simple Timing Models” by Aamer Jaleel

2:45 – 3:30

“Techniques for Speeding Up Pin Based Simulations” by Harish Patil

3:30 – 4:00

-------------------- Break ----------------------

4:00 – 4:30

“Performance Optimization of Pin Tools” by Chi-Keung Luk (CK)

4:30 – 5:00

“Fault Analysis Using Pin” by Srilatha Manne (Bobbie)

ABSTRACT

Tired of waiting for your simulations to get done? Wish you could simulate billions of instructions an hour? Wish you had ready-to-use infrastructure that could help you understand the performance bottlenecks of emerging/complex workloads (e.g. Oracle, Java)? If you answered yes, then, this is the venue for YOU.

In this tutorial, we will show that the binary instrumentation tool, Pin, is a great candidate for computer architecture studies such as performance modeling, trace generation, and fault tolerance. We demonstrate the use of Pin to create simple performance models without resorting to the complexities of building detailed performance models. We show that Pin’s robust design makes it easy to conduct performance studies of simple workloads (e.g. SPEC2000, BioBench), multi-threaded workloads (e.g. SPLASH, SPECOMP, NAS, BioParallel) or even complex server workloads like commercial database applications (e.g. Oracle). Since instrumentation is typically much faster than simulation, we show that Pin allows for running workloads to completion without the long waiting time associated with detailed performance models. Such functionality provides for understanding overall program behavior of different workloads.

Besides performance studies, Pin also allows users the capability of conducting reliability studies. Since Pin provides access to architecture specific details, injecting faults and studying the propagation of a fault into different portions of the instruction/data stream can be explored in great detail.

Pin is publicly available for four architectures (IA32, EM64T, Itanium, Xscale) and three operating systems (Linux, Windows, MacOS).

AGENDA

This tutorial consists of four presentations. The first presentation provides an introduction to the Pin API and the basic concepts of writing instrumentation tools in Pin. Detailed instructions on writing instrumentation tools useful for architecture research are also presented. The next three presentations consist of research projects that use Pin.

Introduction To Pin, CK Luk, Intel

The primary goal of Pin is to provide easy-to-use, portable, transparent, and efficient instrumentation. Instrumentation tools, called Pin tools, are written in C/C++ using Pin's rich API. Pin follows the model of ATOM, allowing the tool writer to analyze an application at the instruction level without the need for detailed knowledge of the underlying instruction set. The API is designed to be architecture independent whenever possible, making Pin tools source compatible across different architectures.

Cache Performance of Emerging Workloads on CMPs, Aamer Jaleel, Intel

Chip-multiprocessors (CMPs) have become the next attractive point in the design space of future high performance microprocessors. There is a growing need for simulation methodologies to determine the memory system requirements of emerging parallel workloads in a reasonable amount of time. This session of the tutorial demonstrates the use of PIN as an alternative to execution-driven and trace-driven simulation methodologies. We present the implementation of a cache simulation PIN tool: CMP$im. We show that CMP$im can be used to understand overall memory behavior of a workload as well as characterize instruction profile, cache performance, and data sharing behavior of workloads at speeds of 4-10 MIPS.

Techniques for Speeding up Pin-based Simulation, Harish Patil, Intel

This session of the tutorial discusses stand alone simulation as well as interfacing Pin with existing simulators such as Simple Scalar. Since detailed whole-program simulation can be very slow, techniques for speeding up Pin-based simulation using Simpoints/PinPoints are discussed. Additionally, a Pin-based interface to SimpleScalar.x86 (under development) will also be discussed as a case study.

Understanding Software Fault Resilience, Bobbie Manne, Intel

Recently, there has been much research on transient fault analysis. Much of this work is at the RTL or architectural simulation level. This produces the most accurate picture of the affect of the fault. However, the analysis cannot cover more than a small section of the application run, and cannot explore the impact of the fault beyond a few million instructions. This leads to some conservative assumptions. For instance, if the fault propagates to the memory subsystem, then it is assumed to be relevant if the faulty data is not overwritten before the end of the analysis window. In some cases, any fault that impacts the control flow is assumed to cause program disruption.

This session of the tutorial addresses pin tools that have been created to analyze fault propagation behavior beyond the range of existing simulation infrastructures by looking at the impact of the fault for the complete run of the program. The pin tools perform two tasks: 1.) injects the fault into an input, output or architectural register (depending on what the user wants to model in terms of where the fault occurs in hardware), and 2.) analyzes the impact of the fault in the application. The tutorial describes the methodology and subtleties associated with doing this analysis.

If you have any questions please feel free to contact:

Robert Cohn:: robert.s.cohn@intel.com

Aamer Jaleel: aamer.jaleel@intel.com