11 Commits

Author SHA1 Message Date
James Almer
9e7a93c6fd x86/intreadwrite: add SSE2 optimized AV_COPY128U
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-29 23:17:52 -03:00
James Almer
70c6b904be x86/intreadwrite: add missing casts to pointer arguments
Should make strict compilers happy.

Also, make AV_COPY128 use integer operations while at it. Removing the
inclusion of immintrin.h ensures a lot less intrinsic related headers are
included as well, which fixes a clash of defines with some Clang versions.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-11 18:24:26 -03:00
James Almer
1a86a7a48d x86/intreadwrite: fix include of config.h
Should fix make checkheaders.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-10 13:52:52 -03:00
James Almer
15056dd650 x86/intreadwrite.h: add missing preprocessor checks
Removed by accident in the previous commits. This makes the code only run when
compiled with GCC and Clang like before. Support for other compilers like msvc
can be added later.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-10 13:49:21 -03:00
James Almer
bd1bcb07e0 x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
This has the benefit of removing any SSE -> AVX penalty that may happen when
the compiler emits VEX encoded instructions.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-10 13:25:44 -03:00
James Almer
4a04cca69a x86/intreadwrite: use intrinsics instead of inline asm for AV_ZERO128
When called inside a loop, the inline asm version results in one pxor
unnecessarely emitted per iteration, as the contents of the __asm__() block are
opaque to the compiler's instruction scheduler.
This is not the case with intrinsics, where pxor will be emitted once with any
half decent compiler.

This also has the benefit of removing any SSE -> AVX penalty that may happen
when the compiler emits VEX encoded instructions.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-10 13:25:44 -03:00
Martin Storsjö
7ec2354c38 x86: Remove inline MMX assembly that clobbers the FPU state
These inline implementations of AV_COPY64, AV_SWAP64 and AV_ZERO64
are known to clobber the FPU state - which has to be restored
with the 'emms' instruction afterwards.

This was known and signaled with the FF_COPY_SWAP_ZERO_USES_MMX
define, which calling code seems to have been supposed to check,
in order to call emms_c() after using them. See
0b1972d4096df5879038f0af776f87f41e90ebd4,
29c4c0886d143790fcbeddbe40a23dfc6f56345c and
df215e575850e41b19aeb1fd99e53372a6b3d537 for history on earlier
fixes in the same area.

However, new code can use these AV_*64() macros without knowing
about the need to call emms_c().

Just get rid of these dangerous inline assembly snippets; this
doesn't make any difference for 64 bit architectures anyway.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-02-09 23:55:52 +02:00
Andreas Rheinhardt
29c4c0886d avutil/x86/intreadwrite: Add ability to detect whether MMX code is used
It can be used to call emms_c() only when needed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-11 21:08:04 +02:00
Måns Rullgård
2ed6f39944 Replace many includes of libavutil/common.h with what is actually needed
This reduces the number of false dependencies on header files and
speeds up compilation.

Originally committed as revision 22407 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-09 17:39:19 +00:00
Måns Rullgård
9c9a0840d0 Add lots of missing includes
Originally committed as revision 22337 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-08 18:43:52 +00:00
Alexander Strange
f6d0390657 Add macros for 64- and 128-bit write-combining optimization to intreadwrite.h.
Add x86 implementation using MMX/SSE.

Originally committed as revision 21281 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-18 10:24:33 +00:00