exp-bbv/docs/bbv-manual.xml - platform/external/valgrind - Git at Google

 <?xml version="1.0"?> <!-- -*- sgml -*- -->
 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
   "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">

 <chapter id="bbv-manual" xreflabel="BBV">
   <title>BBV: an experimental basic block vector generation tool</title>

 <para>To use this tool, you must specify
 <option>--tool=exp-bbv</option> on the Valgrind
 command line.</para>

 <sect1 id="bbv-manual.overview" xreflabel="Overview">
 <title>Overview</title>

 <para>
    A basic block is a linear section of code with one entry point and one exit
    point.  A <emphasis>basic block vector</emphasis> (BBV) is a list of all
    basic blocks entered during program execution, and a count of how many
    times each basic block was run.
 </para>

 <para>
    BBV is a tool that generates basic block vectors for use with the
    <ulink url="http://www.cse.ucsd.edu/~calder/simpoint/">SimPoint</ulink>
    analysis tool.
    The SimPoint methodology enables speeding up architectural
    simulations by only running a small portion of a program
    and then extrapolating total behavior from this
    small portion.  Most programs exhibit phase-based behavior, which
    means that at various times during execution a program will encounter
    intervals of time where the code behaves similarly to a previous
    interval.  If you can detect these intervals and group them together,
    an approximation of the total program behavior can be obtained
    by only simulating a bare minimum number of intervals, and then scaling
    the results.
 </para>

 <para>
   In computer architecture research, running a
   benchmark on a cycle-accurate simulator can cause slowdowns on the order
   of 1000 times, making it take days, weeks, or even longer to run full
   benchmarks.  By utilizing SimPoint this can be reduced significantly,
   usually by 90-95%, while still retaining reasonable accuracy.
 </para>

 <para>
    A more complete introduction to how SimPoint works can be
    found in the paper "Automatically Characterizing Large Scale
    Program Behavior" by T. Sherwood, E. Perelman, G. Hamerly, and
    B. Calder.
 </para>

 </sect1>

 <sect1 id="bbv-manual.quickstart" xreflabel="Quick Start">
 <title>Using Basic Block Vectors to create SimPoints</title>

 <para>
    To quickly create a basic block vector file, you will call Valgrind
    like this:

    <programlisting>valgrind --tool=exp-bbv /bin/ls</programlisting>

    In this case we are running on <filename>/bin/ls</filename>,
    but this can be any program.  By default a file called
    <computeroutput>bb.out.PID</computeroutput> will be created,
    where PID is replaced by the process ID of the running process.
    This file contains the basic block vector.  For long-running programs
    this file can be quite large, so it might be wise to compress
    it with gzip or some other compression program.
 </para>

 <para>
    To create actual SimPoint results, you will need the SimPoint utility,
    available from the
    <ulink url="http://www.cse.ucsd.edu/~calder/simpoint/">SimPoint webpage</ulink>.
    Assuming you have downloaded SimPoint 3.2 and compiled it,
    create SimPoint results with a command like the following:

    <programlisting><![CDATA[
 ./SimPoint.3.2/bin/simpoint -inputVectorsGzipped \
     -loadFVFile bb.out.1234.gz \
     -k 5 -saveSimpoints results.simpts \
     -saveSimpointWeights results.weights]]></programlisting>

    where bb.out.1234.gz is your compressed basic block vector file
    generated by BBV.
 </para>

 <para>
    The SimPoint utility does random linear projection using 15-dimensions,
    then does k-mean clustering to calculate which intervals are
    of interest.  In this example we specify 5 intervals with the
    -k 5 option.
 </para>

 <para>
    The outputs from the SimPoint run are the
    <computeroutput>results.simpts</computeroutput>
    and <computeroutput>results.weights</computeroutput> files.
    The first holds the 5 most relevant intervals of the program.
    The seconds holds the weight to scale each interval by when
    extrapolating full-program behavior.  The intervals and the weights
    can be used in conjunction with a simulator that supports
    fast-forwarding; you fast-forward to the interval of interest,
    collect stats for the desired interval length, then use
    statistics gathered in conjunction with the weights to
    calculate your results.
 </para>

 </sect1>

 <sect1 id="bbv-manual.usage" xreflabel="BBV Command-line Options">
 <title>BBV Command-line Options</title>

 <para> BBV-specific command-line options are:</para>

 <!-- start of xi:include in the manpage -->
 <variablelist id="bbv.opts.list">

   <varlistentry id="opt.bb-out-file" xreflabel="--bb-out-file">
      <term>
         <option><![CDATA[--bb-out-file=<name> [default: bb.out.%p] ]]></option>
      </term>
      <listitem>
         <para>
            This option selects the name of the basic block vector file.  The
            <option>%p</option> and <option>%q</option> format specifiers can be
            used to embed the process ID and/or the contents of an environment
            variable in the name, as is the case for the core option
            <option><xref linkend="opt.log-file"/></option>.
         </para>
      </listitem>
   </varlistentry>

   <varlistentry id="opt.pc-out-file" xreflabel="--pc-out-file">
      <term>
         <option><![CDATA[--pc-out-file=<name> [default: pc.out.%p] ]]></option>
      </term>
      <listitem>
         <para>
            This option selects the name of the PC file.
            This file holds program counter addresses
            and function name info for the various basic blocks.
            This can be used in conjunction
            with the basic block vector file to fast-forward via function names
            instead of just instruction counts.  The
            <option>%p</option> and <option>%q</option> format specifiers can be
            used to embed the process ID and/or the contents of an environment
            variable in the name, as is the case for the core option
            <option><xref linkend="opt.log-file"/></option>.
         </para>
      </listitem>
    </varlistentry>

    <varlistentry id="opt.interval-size" xreflabel="--interval-size">
       <term>
         <option><![CDATA[--interval-size=<number> [default: 100000000] ]]></option>
       </term>
       <listitem>
       <para>
          This option selects the size of the interval to use.
          The default is 100
          million instructions, which is a commonly used value.
          Other sizes can be used; smaller intervals can help programs
          with finer-grained phases.  However smaller interval size
          can lead to accuracy issues due to warm-up effects
          (When fast-forwarding the various architectural features
          will be un-initialized, and it will take some number
          of instructions before they "warm up" to the state a
          full simulation would be at without the fast-forwarding.
          Large interval sizes tend to mitigate this.)
       </para>
       </listitem>
   </varlistentry>

   <varlistentry id="opt.instr-count-only" xreflabel="--instr-count-only">
      <term>
         <option><![CDATA[--instr-count-only [default: no] ]]></option>
      </term>
      <listitem>
         <para>
            This option tells the tool to only display instruction count
            totals, and to not generate the actual basic block vector file.
            This is useful for debugging, and for gathering instruction count
            info without generating the large basic block vector files.
         </para>
      </listitem>
    </varlistentry>


 </variablelist>
 <!-- end of xi:include in the manpage -->

 </sect1>

 <sect1 id="bbv-manual.fileformat" xreflabel="BBV File Format">
 <title>Basic Block Vector File Format</title>

 <para>
   The Basic Block Vector is dumped at fixed intervals.  This
   is commonly done every 100 million instructions; the
   <option>--interval-size</option> option can be
   used to change this.
 </para>

 <para>
   The output file looks like this:
 </para>

 <programlisting><![CDATA[
 T:45:1024 :189:99343
 T:11:78573 :15:1353  :56:1
 T:18:45 :12:135353 :56:78 314:4324263]]></programlisting>

 <para>
   Each new interval starts with a T.   This is followed on the same line
   by a series of basic block and frequency pairs, one for each
   basic block that was entered during the interval.  The format for
   each block/frequency pair is a colon, followed by a number that
   uniquely identifies the basic block, another colon, and then
   the frequency (which is the number of times the block was entered,
   multiplied by the number of instructions in the block).  The
   pairs are separated from each other by a space.
 </para>

 <para>
   The frequency count is multiplied by the number of instructions that are
   in the basic block, in order to weigh the count so that instructions in
   small basic blocks aren't counted as more important than instructions
   in large basic blocks.
 </para>

 <para>
   The SimPoint program only processes lines that start with a "T".  All
   other lines are ignored.  Traditionally comments are indicated by
   starting a line with a "#" character.  Some other BBV generation tools,
   such as PinPoints, generate lines beginning with letters other than "T"
   to indicate more information about the program being run.  We do
   not generate these, as the SimPoint utility ignores them.
 </para>

 </sect1>

 <sect1 id="bbv-manual.implementation" xreflabel="Implementation">
 <title>Implementation</title>

 <para>
    Valgrind provides all of the information necessary to create
    BBV files.  In the current implementation, all instructions
    are instrumented.  This is slower (by approximately a factor
    of two) than a method that instruments at the basic block level,
    but there are some complications (especially with rep prefix
    detection) that make that method more difficult.
 </para>

 <para>
    Valgrind actually provides instrumentation at a superblock level.
    A superblock has one entry point but unlike basic blocks can
    have multiple exit points.  Once a branch occurs into the middle
    of a block, it is split into a new basic block.  Because
    Valgrind cannot produce "true" basic blocks, the generated
    BBV vectors will be different than those generated by other tools.
    In practice this does not seem to affect the accuracy of the
    SimPoint results.  We do internally force the
    <option>--vex-guest-chase-thresh=0</option>
    option to Valgrind which forces a more basic-block-like
    behavior.
 </para>

 <para>
    When a superblock is run for the first time, it is instrumented
    with our BBV routine.  A block info (bbInfo) structure is allocated
    which holds the various information and statistics for the block.
    A unique block ID is assigned to the block, and then the
    structure is placed into an ordered set.
    Then each native instruction in the block is instrumented to
    call an instruction counting routine with a pointer to the block
    info structure as an argument.
 </para>

 <para>
    At run-time, our instruction counting routines are called once
    per native instruction.  The relevant block info structure is accessed
    and the block count and total instruction count is updated.
    If the total instruction count overflows the interval size
    then we walk the ordered set, writing out the statistics for
    any block that was accessed in the interval, then resetting the
    block counters to zero.
 </para>

 <para>
    On the x86 and amd64 architectures the counting code has extra
    code to handle rep-prefixed string instructions.  This is because
    actual hardware counts a rep-prefixed instruction
    as one instruction, while a naive Valgrind implementation
    would count it as many (possibly hundreds, thousands or even millions)
    of instructions.  We handle rep-prefixed instructions specially,
    in order to make the results match those obtained with hardware performance
    counters.
 </para>

 <para>
    BBV also counts the fldcw instruction.  This instruction is used on
    x86 machines in various ways; it is most commonly found when converting
    floating point values into integers.
    On Pentium 4 systems the retired instruction performance
    counter counts this instruction as two instructions (all other
    known processors only count it as one).
    This can affect results when using SimPoint on Pentium 4 systems.
    We provide the fldcw count so that users can evaluate whether it
    will impact their results enough to avoid using Pentium 4 machines
    for their experiments.  It would be possible to add an option to
    this tool that mimics the double-counting so that the generated BBV
    files would be usable for experiments using hardware performance
    counters on Pentium 4 systems.
 </para>

 </sect1>

 <sect1 id="bbv-manual.threadsupport" xreflabel="BBV Threaded Support">
 <title>Threaded Executable Support</title>

 <para>
    BBV supports threaded programs.  When a program has multiple threads,
    an additional basic block vector file is created for each thread (each
    additional file is the specified filename with the thread number
    appended at the end).
 </para>

 <para>
    There is no official method of using SimPoint with
    threaded workloads.  The most common method is to run
    SimPoint on each thread's results independently, and use
    some method of deterministic execution to try to match the
    original workload.  This should be possible with the current
    BBV.
 </para>

 </sect1>

 <sect1 id="bbv-manual.validation" xreflabel="BBV Validation">
 <title>Validation</title>

 <para>
    BBV has been tested on x86, amd64, and ppc32 platforms.
    An earlier version of BBV was tested in detail using
    hardware performance counters, this work is described in a paper
    from the HiPEAC'08 conference, "Using Dynamic Binary Instrumentation
    to Generate Multi-Platform SimPoints: Methodology and Accuracy" by
    V.M. Weaver and S.A. McKee.
 </para>

 </sect1>

 <sect1 id="bbv-manual.performance" xreflabel="BBV Performance">
 <title>Performance</title>

 <para>
   Using this program slows down execution by roughly a factor of 40
   over native execution.  This varies depending on the machine
   used and the benchmark being run.
   On the SPEC CPU 2000 benchmarks running on a 3.4GHz Pentium D
   processor, the slowdown ranges from 24x (mcf) to 340x (vortex.2).
 </para>

 </sect1>

 </chapter>
	<?xml version="1.0"?> <!-- -- sgml -- -->
	<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
	"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">

	<chapter id="bbv-manual" xreflabel="BBV">
	<title>BBV: an experimental basic block vector generation tool</title>

	<para>To use this tool, you must specify
	<option>--tool=exp-bbv</option> on the Valgrind
	command line.</para>

	<sect1 id="bbv-manual.overview" xreflabel="Overview">
	<title>Overview</title>

	<para>
	A basic block is a linear section of code with one entry point and one exit
	point. A <emphasis>basic block vector</emphasis> (BBV) is a list of all
	basic blocks entered during program execution, and a count of how many
	times each basic block was run.
	</para>

	<para>
	BBV is a tool that generates basic block vectors for use with the
	<ulink url="http://www.cse.ucsd.edu/~calder/simpoint/">SimPoint</ulink>
	analysis tool.
	The SimPoint methodology enables speeding up architectural
	simulations by only running a small portion of a program
	and then extrapolating total behavior from this
	small portion. Most programs exhibit phase-based behavior, which
	means that at various times during execution a program will encounter
	intervals of time where the code behaves similarly to a previous
	interval. If you can detect these intervals and group them together,
	an approximation of the total program behavior can be obtained
	by only simulating a bare minimum number of intervals, and then scaling
	the results.
	</para>

	<para>
	In computer architecture research, running a
	benchmark on a cycle-accurate simulator can cause slowdowns on the order
	of 1000 times, making it take days, weeks, or even longer to run full
	benchmarks. By utilizing SimPoint this can be reduced significantly,
	usually by 90-95%, while still retaining reasonable accuracy.
	</para>

	<para>
	A more complete introduction to how SimPoint works can be
	found in the paper "Automatically Characterizing Large Scale
	Program Behavior" by T. Sherwood, E. Perelman, G. Hamerly, and
	B. Calder.
	</para>

	</sect1>

	<sect1 id="bbv-manual.quickstart" xreflabel="Quick Start">
	<title>Using Basic Block Vectors to create SimPoints</title>

	<para>
	To quickly create a basic block vector file, you will call Valgrind
	like this:

	<programlisting>valgrind --tool=exp-bbv /bin/ls</programlisting>

	In this case we are running on <filename>/bin/ls</filename>,
	but this can be any program. By default a file called
	<computeroutput>bb.out.PID</computeroutput> will be created,
	where PID is replaced by the process ID of the running process.
	This file contains the basic block vector. For long-running programs
	this file can be quite large, so it might be wise to compress
	it with gzip or some other compression program.
	</para>

	<para>
	To create actual SimPoint results, you will need the SimPoint utility,
	available from the
	<ulink url="http://www.cse.ucsd.edu/~calder/simpoint/">SimPoint webpage</ulink>.
	Assuming you have downloaded SimPoint 3.2 and compiled it,
	create SimPoint results with a command like the following:

	<programlisting><![CDATA[
	./SimPoint.3.2/bin/simpoint -inputVectorsGzipped \
	-loadFVFile bb.out.1234.gz \
	-k 5 -saveSimpoints results.simpts \
	-saveSimpointWeights results.weights]]></programlisting>

	where bb.out.1234.gz is your compressed basic block vector file
	generated by BBV.
	</para>

	<para>
	The SimPoint utility does random linear projection using 15-dimensions,
	then does k-mean clustering to calculate which intervals are
	of interest. In this example we specify 5 intervals with the
	-k 5 option.
	</para>

	<para>
	The outputs from the SimPoint run are the
	<computeroutput>results.simpts</computeroutput>
	and <computeroutput>results.weights</computeroutput> files.
	The first holds the 5 most relevant intervals of the program.
	The seconds holds the weight to scale each interval by when
	extrapolating full-program behavior. The intervals and the weights
	can be used in conjunction with a simulator that supports
	fast-forwarding; you fast-forward to the interval of interest,
	collect stats for the desired interval length, then use
	statistics gathered in conjunction with the weights to
	calculate your results.
	</para>

	</sect1>

	<sect1 id="bbv-manual.usage" xreflabel="BBV Command-line Options">
	<title>BBV Command-line Options</title>

	<para> BBV-specific command-line options are:</para>

	<!-- start of xi:include in the manpage -->
	<variablelist id="bbv.opts.list">

	<varlistentry id="opt.bb-out-file" xreflabel="--bb-out-file">
	<term>
	<option><![CDATA[--bb-out-file=<name> [default: bb.out.%p] ]]></option>
	</term>
	<listitem>
	<para>
	This option selects the name of the basic block vector file. The
	<option>%p</option> and <option>%q</option> format specifiers can be
	used to embed the process ID and/or the contents of an environment
	variable in the name, as is the case for the core option
	<option><xref linkend="opt.log-file"/></option>.
	</para>
	</listitem>
	</varlistentry>

	<varlistentry id="opt.pc-out-file" xreflabel="--pc-out-file">
	<term>
	<option><![CDATA[--pc-out-file=<name> [default: pc.out.%p] ]]></option>
	</term>
	<listitem>
	<para>
	This option selects the name of the PC file.
	This file holds program counter addresses
	and function name info for the various basic blocks.
	This can be used in conjunction
	with the basic block vector file to fast-forward via function names
	instead of just instruction counts. The
	<option>%p</option> and <option>%q</option> format specifiers can be
	used to embed the process ID and/or the contents of an environment
	variable in the name, as is the case for the core option
	<option><xref linkend="opt.log-file"/></option>.
	</para>
	</listitem>
	</varlistentry>

	<varlistentry id="opt.interval-size" xreflabel="--interval-size">
	<term>
	<option><![CDATA[--interval-size=<number> [default: 100000000] ]]></option>
	</term>
	<listitem>
	<para>
	This option selects the size of the interval to use.
	The default is 100
	million instructions, which is a commonly used value.
	Other sizes can be used; smaller intervals can help programs
	with finer-grained phases. However smaller interval size
	can lead to accuracy issues due to warm-up effects
	(When fast-forwarding the various architectural features
	will be un-initialized, and it will take some number
	of instructions before they "warm up" to the state a
	full simulation would be at without the fast-forwarding.
	Large interval sizes tend to mitigate this.)
	</para>
	</listitem>
	</varlistentry>

	<varlistentry id="opt.instr-count-only" xreflabel="--instr-count-only">
	<term>
	<option><![CDATA[--instr-count-only [default: no] ]]></option>
	</term>
	<listitem>
	<para>
	This option tells the tool to only display instruction count
	totals, and to not generate the actual basic block vector file.
	This is useful for debugging, and for gathering instruction count
	info without generating the large basic block vector files.
	</para>
	</listitem>
	</varlistentry>


	</variablelist>
	<!-- end of xi:include in the manpage -->

	</sect1>

	<sect1 id="bbv-manual.fileformat" xreflabel="BBV File Format">
	<title>Basic Block Vector File Format</title>

	<para>
	The Basic Block Vector is dumped at fixed intervals. This
	is commonly done every 100 million instructions; the
	<option>--interval-size</option> option can be
	used to change this.
	</para>

	<para>
	The output file looks like this:
	</para>

	<programlisting><![CDATA[
	T:45:1024 :189:99343
	T:11:78573 :15:1353 :56:1
	T:18:45 :12:135353 :56:78 314:4324263]]></programlisting>

	<para>
	Each new interval starts with a T. This is followed on the same line
	by a series of basic block and frequency pairs, one for each
	basic block that was entered during the interval. The format for
	each block/frequency pair is a colon, followed by a number that
	uniquely identifies the basic block, another colon, and then
	the frequency (which is the number of times the block was entered,
	multiplied by the number of instructions in the block). The
	pairs are separated from each other by a space.
	</para>

	<para>
	The frequency count is multiplied by the number of instructions that are
	in the basic block, in order to weigh the count so that instructions in
	small basic blocks aren't counted as more important than instructions
	in large basic blocks.
	</para>

	<para>
	The SimPoint program only processes lines that start with a "T". All
	other lines are ignored. Traditionally comments are indicated by
	starting a line with a "#" character. Some other BBV generation tools,
	such as PinPoints, generate lines beginning with letters other than "T"
	to indicate more information about the program being run. We do
	not generate these, as the SimPoint utility ignores them.
	</para>

	</sect1>

	<sect1 id="bbv-manual.implementation" xreflabel="Implementation">
	<title>Implementation</title>

	<para>
	Valgrind provides all of the information necessary to create
	BBV files. In the current implementation, all instructions
	are instrumented. This is slower (by approximately a factor
	of two) than a method that instruments at the basic block level,
	but there are some complications (especially with rep prefix
	detection) that make that method more difficult.
	</para>

	<para>
	Valgrind actually provides instrumentation at a superblock level.
	A superblock has one entry point but unlike basic blocks can
	have multiple exit points. Once a branch occurs into the middle
	of a block, it is split into a new basic block. Because
	Valgrind cannot produce "true" basic blocks, the generated
	BBV vectors will be different than those generated by other tools.
	In practice this does not seem to affect the accuracy of the
	SimPoint results. We do internally force the
	<option>--vex-guest-chase-thresh=0</option>
	option to Valgrind which forces a more basic-block-like
	behavior.
	</para>

	<para>
	When a superblock is run for the first time, it is instrumented
	with our BBV routine. A block info (bbInfo) structure is allocated
	which holds the various information and statistics for the block.
	A unique block ID is assigned to the block, and then the
	structure is placed into an ordered set.
	Then each native instruction in the block is instrumented to
	call an instruction counting routine with a pointer to the block
	info structure as an argument.
	</para>

	<para>
	At run-time, our instruction counting routines are called once
	per native instruction. The relevant block info structure is accessed
	and the block count and total instruction count is updated.
	If the total instruction count overflows the interval size
	then we walk the ordered set, writing out the statistics for
	any block that was accessed in the interval, then resetting the
	block counters to zero.
	</para>

	<para>
	On the x86 and amd64 architectures the counting code has extra
	code to handle rep-prefixed string instructions. This is because
	actual hardware counts a rep-prefixed instruction
	as one instruction, while a naive Valgrind implementation
	would count it as many (possibly hundreds, thousands or even millions)
	of instructions. We handle rep-prefixed instructions specially,
	in order to make the results match those obtained with hardware performance
	counters.
	</para>

	<para>
	BBV also counts the fldcw instruction. This instruction is used on
	x86 machines in various ways; it is most commonly found when converting
	floating point values into integers.
	On Pentium 4 systems the retired instruction performance
	counter counts this instruction as two instructions (all other
	known processors only count it as one).
	This can affect results when using SimPoint on Pentium 4 systems.
	We provide the fldcw count so that users can evaluate whether it
	will impact their results enough to avoid using Pentium 4 machines
	for their experiments. It would be possible to add an option to
	this tool that mimics the double-counting so that the generated BBV
	files would be usable for experiments using hardware performance
	counters on Pentium 4 systems.
	</para>

	</sect1>

	<sect1 id="bbv-manual.threadsupport" xreflabel="BBV Threaded Support">
	<title>Threaded Executable Support</title>

	<para>
	BBV supports threaded programs. When a program has multiple threads,
	an additional basic block vector file is created for each thread (each
	additional file is the specified filename with the thread number
	appended at the end).
	</para>

	<para>
	There is no official method of using SimPoint with
	threaded workloads. The most common method is to run
	SimPoint on each thread's results independently, and use
	some method of deterministic execution to try to match the
	original workload. This should be possible with the current
	BBV.
	</para>

	</sect1>

	<sect1 id="bbv-manual.validation" xreflabel="BBV Validation">
	<title>Validation</title>

	<para>
	BBV has been tested on x86, amd64, and ppc32 platforms.
	An earlier version of BBV was tested in detail using
	hardware performance counters, this work is described in a paper
	from the HiPEAC'08 conference, "Using Dynamic Binary Instrumentation
	to Generate Multi-Platform SimPoints: Methodology and Accuracy" by
	V.M. Weaver and S.A. McKee.
	</para>

	</sect1>

	<sect1 id="bbv-manual.performance" xreflabel="BBV Performance">
	<title>Performance</title>

	<para>
	Using this program slows down execution by roughly a factor of 40
	over native execution. This varies depending on the machine
	used and the benchmark being run.
	On the SPEC CPU 2000 benchmarks running on a 3.4GHz Pentium D
	processor, the slowdown ranges from 24x (mcf) to 340x (vortex.2).
	</para>

	</sect1>

	</chapter>