Justin Ruggles
0e8fdd41c2
dsputil: use cpuflags in x86 emu_edge_core
...
avoids passing around the extra argument among all the macros it uses
2011-11-22 15:40:51 -05:00
Justin Ruggles
395f2e70dd
dsputil: use movups instead of movdqu in ff_emu_edge_core_sse()
...
This allows emulated_edge_mc_sse() and gmc_sse() to be used under
AV_CPU_FLAG_SSE.
2011-11-22 15:40:51 -05:00
Justin Ruggles
9d06037d48
twinvq: add SSE/AVX optimized sum/difference stereo interleaving
2011-11-11 14:13:58 -05:00
Justin Ruggles
b8f02f5b4e
dsputil: use cpuflags in x86 versions of vector_clip_int32()
2011-11-06 20:50:06 -05:00
Justin Ruggles
4e8e262476
fmtconvert: port int32_to_float_fmul_scalar() x86 inline asm to yasm
2011-10-21 10:13:05 -04:00
Ronald S. Bultje
38e06c2969
Move clipd macros to x86util.asm.
...
This allows sharing them between multiple .asm files.
2011-08-17 20:56:06 -07:00
Dave Yeo
cc73511e8e
Fix NASM include directive
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-08-15 11:24:35 -07:00
Ronald S. Bultje
3a39195b1d
Move x86inc.asm to libavutil/.
...
This allows using it in libswscale/ also.
2011-08-12 11:43:02 -07:00
Justin Ruggles
6054cd25b4
ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.
2011-07-01 13:02:11 -04:00
Dave Yeo
d69f9a4234
Add support for a.out object format to assembler macros.
...
This format is still used by e.g. OS/2.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-05-20 17:52:21 +02:00
Diego Biurrun
888fa31eca
Fix FSF address copy paste error in some license headers.
2011-05-14 21:32:31 +02:00
Justin Ruggles
e6e9823488
Add apply_window_int16() to DSPContext with x86-optimized versions and use it
...
in the ac3_fixed encoder.
2011-03-22 21:08:30 -04:00
Mans Rullgard
2912e87a6c
Replace FFmpeg with Libav in licence headers
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-19 13:33:20 +00:00
Ronald S. Bultje
17cf7c68ed
Fix ff_emu_edge_core_sse() on Win64.
...
Fix emu_edge_v_extend_15 to be <128 bytes on Win64, by being more strict
on the size of registers and which registers are being used for operations
where multiple are available. This fixes segfaults in emulated_edge()
function calls on Win64.
2011-02-08 18:25:12 -05:00
Justin Ruggles
c73d99e672
Separate format conversion DSP functions from DSPContext.
...
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-02 02:44:53 +00:00
Ronald S. Bultje
81f2a3f4ff
Implement a SIMD version of emulated_edge_mc() for x86.
...
From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32)
and 196 (SSE2/x86-32) cycles.
2011-01-31 20:55:56 -05:00
Jason Garrett-Glaser
2966cc1849
Update x264asm header files to latest versions.
...
Modify the asm accordingly.
GLOBAL is now no longoer necessary for PIC-compliant loads.
Originally committed as revision 23739 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-23 19:20:46 +00:00
Alex Converse
3deb53849e
Implement an sse version of scalarproduct_float().
...
Originally committed as revision 21386 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-22 23:07:58 +00:00
Loren Merritt
758c7455f1
fix a crash in ape decoding on x86_32 sse2
...
Originally committed as revision 20777 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-12-08 21:24:01 +00:00
Loren Merritt
a4605efdf5
slightly faster scalarproduct_and_madd_int16_ssse3 on penryn, no change on conroe
...
Originally committed as revision 20743 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-12-05 17:53:11 +00:00
Loren Merritt
b1159ad928
refactor and optimize scalarproduct
...
29-105% faster apply_filter, 6-90% faster ape decoding on core2
(Any x86 other than core2 probably gets much less, since this is mostly due to ssse3 cachesplit avoidance and I haven't written the full gamut of other cachesplit modes.)
9-123% faster ape decoding on G4.
Originally committed as revision 20739 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-12-05 15:09:10 +00:00
Loren Merritt
b10fa1bb8b
port ape dsp functions from sse2 to mmx
...
now requires yasm
Originally committed as revision 20722 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-12-03 18:53:12 +00:00
Loren Merritt
b07781b6e4
fix linking on systems with a function name prefix (10l in r20287)
...
Originally committed as revision 20294 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-18 21:44:03 +00:00
Loren Merritt
e17ccf60fe
huffyuv: add some const qualifiers
...
Originally committed as revision 20290 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-18 20:47:25 +00:00
Loren Merritt
2f77923d72
simd add_hfyu_left_prediction
...
2.2x faster than C on conroe, 3.6x on penryn.
4-6% faster huffyuv decoding if using left or plane mode and yuv
Originally committed as revision 20287 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-18 20:10:10 +00:00
Loren Merritt
3daa434a40
ff_add_hfyu_median_prediction_mmx2
...
overall ffvhuff decoding speedup: 28% on core2, 25% on k8.
Originally committed as revision 17059 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-02-08 17:45:30 +00:00
Diego Biurrun
a6493a8fbd
Rename libavcodec/i386/ --> libavcodec/x86/.
...
It contains optimizations that are not specific to i386 and
libavutil uses this naming scheme already.
Originally committed as revision 16270 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-12-22 09:12:42 +00:00