164 lines
		
	
	
		
			5.2 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			164 lines
		
	
	
		
			5.2 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
FFmpeg & evaluating performance on the PowerPC Architecture HOWTO
 | 
						|
 | 
						|
(c) 2003-2004 Romain Dolbeau <romain@dolbeau.org>
 | 
						|
 | 
						|
 | 
						|
 | 
						|
I - Introduction
 | 
						|
 | 
						|
The PowerPC architecture and its SIMD extension AltiVec offer some
 | 
						|
interesting tools to evaluate performance and improve the code.
 | 
						|
This document tries to explain how to use those tools with FFmpeg.
 | 
						|
 | 
						|
The architecture itself offers two ways to evaluate the performance of
 | 
						|
a given piece of code:
 | 
						|
 | 
						|
1) The Time Base Registers (TBL)
 | 
						|
2) The Performance Monitor Counter Registers (PMC)
 | 
						|
 | 
						|
The first ones are always available, always active, but they're not very
 | 
						|
accurate: the registers increment by one every four *bus* cycles. On
 | 
						|
my 667 Mhz tiBook (ppc7450), this means once every twenty *processor*
 | 
						|
cycles. So we won't use that.
 | 
						|
 | 
						|
The PMC are much more useful: not only can they report cycle-accurate
 | 
						|
timing, but they can also be used to monitor many other parameters,
 | 
						|
such as the number of AltiVec stalls for every kind of instruction,
 | 
						|
or instruction cache misses. The downside is that not all processors
 | 
						|
support the PMC (all G3, all G4 and the 970 do support them), and
 | 
						|
they're inactive by default - you need to activate them with a
 | 
						|
dedicated tool. Also, the number of available PMC depends on the
 | 
						|
procesor: the various 604 have 2, the various 75x (aka. G3) have 4,
 | 
						|
and the various 74xx (aka G4) have 6.
 | 
						|
 | 
						|
*WARNING*: The PowerPC 970 is not very well documented, and its PMC
 | 
						|
registers are 64 bits wide. To properly notify the code, you *must*
 | 
						|
tune for the 970 (using --tune=970), or the code will assume 32 bit
 | 
						|
registers.
 | 
						|
 | 
						|
 | 
						|
II - Enabling FFmpeg PowerPC performance support
 | 
						|
 | 
						|
This needs to be done by hand. First, you need to configure FFmpeg as
 | 
						|
usual, but add the "--powerpc-perf-enable" option. For instance:
 | 
						|
 | 
						|
#####
 | 
						|
./configure --prefix=/usr/local/ffmpeg-cvs --cc=gcc-3.3 --tune=7450 --powerpc-perf-enable
 | 
						|
#####
 | 
						|
 | 
						|
This will configure FFmpeg to install inside /usr/local/ffmpeg-cvs,
 | 
						|
compiling with gcc-3.3 (you should try to use this one or a newer
 | 
						|
gcc), and tuning for the PowerPC 7450 (i.e. the newer G4; as a rule of
 | 
						|
thumb, those at 550Mhz and more). It will also enable the PMC.
 | 
						|
 | 
						|
You may also edit the file "config.h" to enable the following line:
 | 
						|
 | 
						|
#####
 | 
						|
// #define ALTIVEC_USE_REFERENCE_C_CODE 1
 | 
						|
#####
 | 
						|
 | 
						|
If you enable this line, then the code will not make use of AltiVec,
 | 
						|
but will use the reference C code instead. This is useful to compare
 | 
						|
performance between two versions of the code.
 | 
						|
 | 
						|
Also, the number of enabled PMC is defined in "libavcodec/ppc/dsputil_ppc.h":
 | 
						|
 | 
						|
#####
 | 
						|
#define POWERPC_NUM_PMC_ENABLED 4
 | 
						|
#####
 | 
						|
 | 
						|
If you have a G4 CPU, you can enable all 6 PMC. DO NOT enable more
 | 
						|
PMC than available on your CPU!
 | 
						|
 | 
						|
Then, simply compile FFmpeg as usual (make && make install).
 | 
						|
 | 
						|
 | 
						|
 | 
						|
III - Using FFmpeg PowerPC performance support
 | 
						|
 | 
						|
This FFmeg can be used exactly as usual. But before exiting, FFmpeg
 | 
						|
will dump a per-function report that looks like this:
 | 
						|
 | 
						|
#####
 | 
						|
PowerPC performance report
 | 
						|
 Values are from the PMC registers, and represent whatever the
 | 
						|
 registers are set to record.
 | 
						|
 Function "gmc1_altivec" (pmc1):
 | 
						|
        min: 231
 | 
						|
        max: 1339867
 | 
						|
        avg: 558.25 (255302)
 | 
						|
 Function "gmc1_altivec" (pmc2):
 | 
						|
        min: 93
 | 
						|
        max: 2164
 | 
						|
        avg: 267.31 (255302)
 | 
						|
 Function "gmc1_altivec" (pmc3):
 | 
						|
        min: 72
 | 
						|
        max: 1987
 | 
						|
        avg: 276.20 (255302)
 | 
						|
(...)
 | 
						|
#####
 | 
						|
 | 
						|
In this example, PMC1 was set to record CPU cycles, PMC2 was set to
 | 
						|
record AltiVec Permute Stall Cycles, and PMC3 was set to record AltiVec
 | 
						|
Issue Stalls.
 | 
						|
 | 
						|
The function "gmc1_altivec" was monitored 255302 times, and the
 | 
						|
minimum execution time was 231 processor cycles. The max and average
 | 
						|
aren't much use, as it's very likely the OS interrupted execution for
 | 
						|
reasons of its own :-(
 | 
						|
 | 
						|
With the exact same settings and source file, but using the reference C
 | 
						|
code we get:
 | 
						|
 | 
						|
#####
 | 
						|
PowerPC performance report
 | 
						|
 Values are from the PMC registers, and represent whatever the
 | 
						|
 registers are set to record.
 | 
						|
 Function "gmc1_altivec" (pmc1):
 | 
						|
        min: 592
 | 
						|
        max: 2532235
 | 
						|
        avg: 962.88 (255302)
 | 
						|
 Function "gmc1_altivec" (pmc2):
 | 
						|
        min: 0
 | 
						|
        max: 33
 | 
						|
        avg: 0.00 (255302)
 | 
						|
 Function "gmc1_altivec" (pmc3):
 | 
						|
        min: 0
 | 
						|
        max: 350
 | 
						|
        avg: 0.03 (255302)
 | 
						|
(...)
 | 
						|
#####
 | 
						|
 | 
						|
592 cycles, so the fastest AltiVec execution is about 2.5x faster than
 | 
						|
the fastest C execution in this example. It's not perfect but it's not
 | 
						|
bad (well I wrote this function so I can't say otherwise :-).
 | 
						|
 | 
						|
Once you have that kind of report, you can try to improve things by
 | 
						|
finding what goes wrong and fixing it; in the example above, one
 | 
						|
should try to diminish the number of AltiVec stalls, as this *may*
 | 
						|
improve performance.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
IV) Enabling the PMC in Mac OS X
 | 
						|
 | 
						|
This is easy. Use "Monster" and "monster". Those tools come from
 | 
						|
Apple's CHUD package, and can be found hidden in the developer web
 | 
						|
site & FTP site. "MONster" is the graphical application, use it to
 | 
						|
generate a config file specifying what each register should
 | 
						|
monitor. Then use the command-line application "monster" to use that
 | 
						|
config file, and enjoy the results.
 | 
						|
 | 
						|
Note that "MONster" can be used for many other things, but it's
 | 
						|
documented by Apple, it's not my subject.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
V) Enabling the PMC on Linux
 | 
						|
 | 
						|
I don't know how to do it, sorry :-) Any idea very much welcome.
 | 
						|
 | 
						|
-- 
 | 
						|
Romain Dolbeau
 | 
						|
<romain@dolbeau.org>
 |