Timo Rothenpieler 
							
						 
					 
					
						
						
						
						
							
						
						
							f2de911818 
							
						 
					 
					
						
						
							
							swscale: add opaque parameter to input functions  
						
						
						
						
					 
					
						2022-08-19 22:09:36 +02:00 
						 
				 
			
				
					
						
							
							
								Alan Kelly 
							
						 
					 
					
						
						
						
						
							
						
						
							a38293e444 
							
						 
					 
					
						
						
							
							libswscale: Enable hscale_avx2 for all input sizes.  
						
						... 
						
						
						
						ff_shuffle_filter_coefficients shuffles the tail as required.
Signed-off-by: Anton Khirnov <anton@khirnov.net> 
						
						
					 
					
						2022-08-18 16:24:48 +02:00 
						 
				 
			
				
					
						
							
							
								Alan Kelly 
							
						 
					 
					
						
						
						
						
							
						
						
							51a34e8525 
							
						 
					 
					
						
						
							
							sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext  
						
						... 
						
						
						
						Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> 
						
						
					 
					
						2022-08-18 16:19:13 +02:00 
						 
				 
			
				
					
						
							
							
								Swinney, Jonathan 
							
						 
					 
					
						
						
						
						
							
						
						
							4dcd191a50 
							
						 
					 
					
						
						
							
							checkasm: updated tests for sw_scale  
						
						... 
						
						
						
						Change the reference to exactly match the C reference in swscale,
instead of exactly matching the x86 SIMD implementations (which
differs slightly). Test with and without SWS_ACCURATE_RND - if this
flag isn't set, the output must match the C reference exactly,
otherwise it is allowed to be off by 2.
Mark a couple x86 functions as unavailable when SWS_ACCURATE_RND
is set - apparently this discrepancy hasn't been noticed in other
exact tests before.
Add a test for yuv2plane1.
Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st> 
						
						
					 
					
						2022-08-16 13:40:42 +03:00 
						 
				 
			
				
					
						
							
							
								Andreas Rheinhardt 
							
						 
					 
					
						
						
						
						
							
						
						
							81d3472031 
							
						 
					 
					
						
						
							
							swscale/x86/swscale: Simplify macro  
						
						... 
						
						
						
						This is possible now that it is no longer used by MMX.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> 
						
						
					 
					
						2022-06-22 13:36:18 +02:00 
						 
				 
			
				
					
						
							
							
								Andreas Rheinhardt 
							
						 
					 
					
						
						
						
						
							
						
						
							a05f22eaf3 
							
						 
					 
					
						
						
							
							swscale/x86/swscale: Remove obsolete and harmful MMX(EXT) functions  
						
						... 
						
						
						
						x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT, SSE and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2). So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.
Moreover, some of the removed code was buggy/not bitexact
and lead to failures involving the f32le and f32be versions of
gray, gbrp and gbrap on x86-32 when SSE2 was not disabled.
See e.g.
https://fate.ffmpeg.org/report.cgi?time=20220609221253&slot=x86_32-debian-kfreebsd-gcc-4.4-cpuflags-mmx 
Notice that yuv2yuvX_mmx is not removed, because it is used
by SSE3 and AVX2 as fallback in case of unaligned data and
also for tail processing. I don't know why yuv2yuvX_mmxext
isn't being used for this; an earlier version [1] of
554c2bc7086f49ef5a6a989ad6bc4bc11807eb6f used it, but
the version that was eventually applied does not.
[1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2020-November/272124.html 
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> 
						
						
					 
					
						2022-06-22 13:36:04 +02:00 
						 
				 
			
				
					
						
							
							
								Andreas Rheinhardt 
							
						 
					 
					
						
						
						
						
							
						
						
							71e2825150 
							
						 
					 
					
						
						
							
							swscale/x86/swscale: Remove superfluous and invalid ';'  
						
						... 
						
						
						
						Inside a function an unnecessary ';' is just a null statement;
yet outside of it it is actually illegal (but compilers happen
to accept it without warning except when using -pedantic).
So modify the macros to always expect the user to add a ';'.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> 
						
						
					 
					
						2022-01-22 17:00:45 +01:00 
						 
				 
			
				
					
						
							
							
								Mark Reid 
							
						 
					 
					
						
						
						
						
							
						
						
							52f7026164 
							
						 
					 
					
						
						
							
							swscale/x86/input.asm: add x86-optimized planer rgb2yuv functions  
						
						... 
						
						
						
						sse2 only operates on 2 lanes per loop for to_y and to_uv functions, due
to the lack of pmulld instruction.  Emulating pmulld with 2 pmuludq and shuffles
proved too costly and made to_uv functions slower then the c implementation.
For to_y on sse2 only float functions are generated,
I was are not able outperform the c implementation on the integer pixel formats.
For to_a on see4 only the float functions are generated.
sse2 and sse4 generated nearly identical performing code on integer pixel formats,
so only sse2/avx2 versions are generated.
planar_gbrp_to_y_512_c: 1197.5
planar_gbrp_to_y_512_sse4: 444.5
planar_gbrp_to_y_512_avx2: 287.5
planar_gbrap_to_y_512_c: 1204.5
planar_gbrap_to_y_512_sse4: 447.5
planar_gbrap_to_y_512_avx2: 289.5
planar_gbrp9be_to_y_512_c: 1380.0
planar_gbrp9be_to_y_512_sse4: 543.5
planar_gbrp9be_to_y_512_avx2: 340.0
planar_gbrp9le_to_y_512_c: 1200.5
planar_gbrp9le_to_y_512_sse4: 442.0
planar_gbrp9le_to_y_512_avx2: 282.0
planar_gbrp10be_to_y_512_c: 1378.5
planar_gbrp10be_to_y_512_sse4: 544.0
planar_gbrp10be_to_y_512_avx2: 337.5
planar_gbrp10le_to_y_512_c: 1200.0
planar_gbrp10le_to_y_512_sse4: 448.0
planar_gbrp10le_to_y_512_avx2: 285.5
planar_gbrap10be_to_y_512_c: 1380.0
planar_gbrap10be_to_y_512_sse4: 542.0
planar_gbrap10be_to_y_512_avx2: 340.5
planar_gbrap10le_to_y_512_c: 1199.0
planar_gbrap10le_to_y_512_sse4: 446.0
planar_gbrap10le_to_y_512_avx2: 289.5
planar_gbrp12be_to_y_512_c: 10563.0
planar_gbrp12be_to_y_512_sse4: 542.5
planar_gbrp12be_to_y_512_avx2: 339.0
planar_gbrp12le_to_y_512_c: 1201.0
planar_gbrp12le_to_y_512_sse4: 440.5
planar_gbrp12le_to_y_512_avx2: 286.0
planar_gbrap12be_to_y_512_c: 1701.5
planar_gbrap12be_to_y_512_sse4: 917.0
planar_gbrap12be_to_y_512_avx2: 338.5
planar_gbrap12le_to_y_512_c: 1201.0
planar_gbrap12le_to_y_512_sse4: 444.5
planar_gbrap12le_to_y_512_avx2: 288.0
planar_gbrp14be_to_y_512_c: 1370.5
planar_gbrp14be_to_y_512_sse4: 545.0
planar_gbrp14be_to_y_512_avx2: 338.5
planar_gbrp14le_to_y_512_c: 1199.0
planar_gbrp14le_to_y_512_sse4: 444.0
planar_gbrp14le_to_y_512_avx2: 279.5
planar_gbrp16be_to_y_512_c: 1364.0
planar_gbrp16be_to_y_512_sse4: 544.5
planar_gbrp16be_to_y_512_avx2: 339.5
planar_gbrp16le_to_y_512_c: 1201.0
planar_gbrp16le_to_y_512_sse4: 445.5
planar_gbrp16le_to_y_512_avx2: 280.5
planar_gbrap16be_to_y_512_c: 1377.0
planar_gbrap16be_to_y_512_sse4: 545.0
planar_gbrap16be_to_y_512_avx2: 338.5
planar_gbrap16le_to_y_512_c: 1201.0
planar_gbrap16le_to_y_512_sse4: 442.0
planar_gbrap16le_to_y_512_avx2: 279.0
planar_gbrpf32be_to_y_512_c: 4113.0
planar_gbrpf32be_to_y_512_sse2: 2438.0
planar_gbrpf32be_to_y_512_sse4: 1068.0
planar_gbrpf32be_to_y_512_avx2: 904.5
planar_gbrpf32le_to_y_512_c: 3818.5
planar_gbrpf32le_to_y_512_sse2: 2024.5
planar_gbrpf32le_to_y_512_sse4: 1241.5
planar_gbrpf32le_to_y_512_avx2: 657.0
planar_gbrapf32be_to_y_512_c: 3707.0
planar_gbrapf32be_to_y_512_sse2: 2444.0
planar_gbrapf32be_to_y_512_sse4: 1077.0
planar_gbrapf32be_to_y_512_avx2: 909.0
planar_gbrapf32le_to_y_512_c: 3822.0
planar_gbrapf32le_to_y_512_sse2: 2024.5
planar_gbrapf32le_to_y_512_sse4: 1176.0
planar_gbrapf32le_to_y_512_avx2: 658.5
planar_gbrp_to_uv_512_c: 2325.8
planar_gbrp_to_uv_512_sse2: 1726.8
planar_gbrp_to_uv_512_sse4: 771.8
planar_gbrp_to_uv_512_avx2: 506.8
planar_gbrap_to_uv_512_c: 2281.8
planar_gbrap_to_uv_512_sse2: 1726.3
planar_gbrap_to_uv_512_sse4: 768.3
planar_gbrap_to_uv_512_avx2: 496.3
planar_gbrp9be_to_uv_512_c: 2336.8
planar_gbrp9be_to_uv_512_sse2: 1924.8
planar_gbrp9be_to_uv_512_sse4: 852.3
planar_gbrp9be_to_uv_512_avx2: 552.8
planar_gbrp9le_to_uv_512_c: 2270.3
planar_gbrp9le_to_uv_512_sse2: 1512.3
planar_gbrp9le_to_uv_512_sse4: 764.3
planar_gbrp9le_to_uv_512_avx2: 491.3
planar_gbrp10be_to_uv_512_c: 2281.8
planar_gbrp10be_to_uv_512_sse2: 1917.8
planar_gbrp10be_to_uv_512_sse4: 855.3
planar_gbrp10be_to_uv_512_avx2: 541.3
planar_gbrp10le_to_uv_512_c: 2269.8
planar_gbrp10le_to_uv_512_sse2: 1515.3
planar_gbrp10le_to_uv_512_sse4: 759.8
planar_gbrp10le_to_uv_512_avx2: 487.8
planar_gbrap10be_to_uv_512_c: 2382.3
planar_gbrap10be_to_uv_512_sse2: 1924.8
planar_gbrap10be_to_uv_512_sse4: 855.3
planar_gbrap10be_to_uv_512_avx2: 540.8
planar_gbrap10le_to_uv_512_c: 2382.3
planar_gbrap10le_to_uv_512_sse2: 1512.3
planar_gbrap10le_to_uv_512_sse4: 759.3
planar_gbrap10le_to_uv_512_avx2: 484.8
planar_gbrp12be_to_uv_512_c: 2283.8
planar_gbrp12be_to_uv_512_sse2: 1936.8
planar_gbrp12be_to_uv_512_sse4: 858.3
planar_gbrp12be_to_uv_512_avx2: 541.3
planar_gbrp12le_to_uv_512_c: 2278.8
planar_gbrp12le_to_uv_512_sse2: 1507.3
planar_gbrp12le_to_uv_512_sse4: 760.3
planar_gbrp12le_to_uv_512_avx2: 485.8
planar_gbrap12be_to_uv_512_c: 2385.3
planar_gbrap12be_to_uv_512_sse2: 1927.8
planar_gbrap12be_to_uv_512_sse4: 855.3
planar_gbrap12be_to_uv_512_avx2: 539.8
planar_gbrap12le_to_uv_512_c: 2377.3
planar_gbrap12le_to_uv_512_sse2: 1516.3
planar_gbrap12le_to_uv_512_sse4: 759.3
planar_gbrap12le_to_uv_512_avx2: 484.8
planar_gbrp14be_to_uv_512_c: 2283.8
planar_gbrp14be_to_uv_512_sse2: 1935.3
planar_gbrp14be_to_uv_512_sse4: 852.3
planar_gbrp14be_to_uv_512_avx2: 540.3
planar_gbrp14le_to_uv_512_c: 2276.8
planar_gbrp14le_to_uv_512_sse2: 1514.8
planar_gbrp14le_to_uv_512_sse4: 762.3
planar_gbrp14le_to_uv_512_avx2: 484.8
planar_gbrp16be_to_uv_512_c: 2383.3
planar_gbrp16be_to_uv_512_sse2: 1881.8
planar_gbrp16be_to_uv_512_sse4: 852.3
planar_gbrp16be_to_uv_512_avx2: 541.8
planar_gbrp16le_to_uv_512_c: 2378.3
planar_gbrp16le_to_uv_512_sse2: 1476.8
planar_gbrp16le_to_uv_512_sse4: 765.3
planar_gbrp16le_to_uv_512_avx2: 485.8
planar_gbrap16be_to_uv_512_c: 2382.3
planar_gbrap16be_to_uv_512_sse2: 1886.3
planar_gbrap16be_to_uv_512_sse4: 853.8
planar_gbrap16be_to_uv_512_avx2: 550.8
planar_gbrap16le_to_uv_512_c: 2381.8
planar_gbrap16le_to_uv_512_sse2: 1488.3
planar_gbrap16le_to_uv_512_sse4: 765.3
planar_gbrap16le_to_uv_512_avx2: 491.8
planar_gbrpf32be_to_uv_512_c: 4863.0
planar_gbrpf32be_to_uv_512_sse2: 3347.5
planar_gbrpf32be_to_uv_512_sse4: 1800.0
planar_gbrpf32be_to_uv_512_avx2: 1199.0
planar_gbrpf32le_to_uv_512_c: 4725.0
planar_gbrpf32le_to_uv_512_sse2: 2753.0
planar_gbrpf32le_to_uv_512_sse4: 1474.5
planar_gbrpf32le_to_uv_512_avx2: 927.5
planar_gbrapf32be_to_uv_512_c: 4859.0
planar_gbrapf32be_to_uv_512_sse2: 3269.0
planar_gbrapf32be_to_uv_512_sse4: 1802.0
planar_gbrapf32be_to_uv_512_avx2: 1201.5
planar_gbrapf32le_to_uv_512_c: 6338.0
planar_gbrapf32le_to_uv_512_sse2: 2756.5
planar_gbrapf32le_to_uv_512_sse4: 1476.0
planar_gbrapf32le_to_uv_512_avx2: 908.5
planar_gbrap_to_a_512_c: 383.3
planar_gbrap_to_a_512_sse2: 66.8
planar_gbrap_to_a_512_avx2: 43.8
planar_gbrap10be_to_a_512_c: 601.8
planar_gbrap10be_to_a_512_sse2: 86.3
planar_gbrap10be_to_a_512_avx2: 34.8
planar_gbrap10le_to_a_512_c: 602.3
planar_gbrap10le_to_a_512_sse2: 48.8
planar_gbrap10le_to_a_512_avx2: 31.3
planar_gbrap12be_to_a_512_c: 601.8
planar_gbrap12be_to_a_512_sse2: 111.8
planar_gbrap12be_to_a_512_avx2: 41.3
planar_gbrap12le_to_a_512_c: 385.8
planar_gbrap12le_to_a_512_sse2: 75.3
planar_gbrap12le_to_a_512_avx2: 39.8
planar_gbrap16be_to_a_512_c: 386.8
planar_gbrap16be_to_a_512_sse2: 79.8
planar_gbrap16be_to_a_512_avx2: 31.3
planar_gbrap16le_to_a_512_c: 600.3
planar_gbrap16le_to_a_512_sse2: 40.3
planar_gbrap16le_to_a_512_avx2: 30.3
planar_gbrapf32be_to_a_512_c: 1148.8
planar_gbrapf32be_to_a_512_sse2: 611.3
planar_gbrapf32be_to_a_512_sse4: 234.8
planar_gbrapf32be_to_a_512_avx2: 183.3
planar_gbrapf32le_to_a_512_c: 851.3
planar_gbrapf32le_to_a_512_sse2: 263.3
planar_gbrapf32le_to_a_512_sse4: 199.3
planar_gbrapf32le_to_a_512_avx2: 156.8
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com> 
						
						
					 
					
						2022-01-11 16:34:33 -03:00 
						 
				 
			
				
					
						
							
							
								Mark Reid 
							
						 
					 
					
						
						
						
						
							
						
						
							9e445a5be2 
							
						 
					 
					
						
						
							
							swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions  
						
						... 
						
						
						
						changes since v2:
 * fixed label
changes since v1:
 * remove vex intruction on sse4 path
 * some load/pack marcos use less intructions
 * fixed some typos
yuv2gbrp_full_X_4_512_c: 12757.6
yuv2gbrp_full_X_4_512_sse2: 8946.6
yuv2gbrp_full_X_4_512_sse4: 5138.6
yuv2gbrp_full_X_4_512_avx2: 3889.6
yuv2gbrap_full_X_4_512_c: 15368.6
yuv2gbrap_full_X_4_512_sse2: 11916.1
yuv2gbrap_full_X_4_512_sse4: 6294.6
yuv2gbrap_full_X_4_512_avx2: 3477.1
yuv2gbrp9be_full_X_4_512_c: 14381.6
yuv2gbrp9be_full_X_4_512_sse2: 9139.1
yuv2gbrp9be_full_X_4_512_sse4: 5150.1
yuv2gbrp9be_full_X_4_512_avx2: 2834.6
yuv2gbrp9le_full_X_4_512_c: 12990.1
yuv2gbrp9le_full_X_4_512_sse2: 9118.1
yuv2gbrp9le_full_X_4_512_sse4: 5132.1
yuv2gbrp9le_full_X_4_512_avx2: 2833.1
yuv2gbrp10be_full_X_4_512_c: 14401.6
yuv2gbrp10be_full_X_4_512_sse2: 9133.1
yuv2gbrp10be_full_X_4_512_sse4: 5126.1
yuv2gbrp10be_full_X_4_512_avx2: 2837.6
yuv2gbrp10le_full_X_4_512_c: 12718.1
yuv2gbrp10le_full_X_4_512_sse2: 9106.1
yuv2gbrp10le_full_X_4_512_sse4: 5120.1
yuv2gbrp10le_full_X_4_512_avx2: 2826.1
yuv2gbrap10be_full_X_4_512_c: 18535.6
yuv2gbrap10be_full_X_4_512_sse2: 33617.6
yuv2gbrap10be_full_X_4_512_sse4: 6264.1
yuv2gbrap10be_full_X_4_512_avx2: 3422.1
yuv2gbrap10le_full_X_4_512_c: 16724.1
yuv2gbrap10le_full_X_4_512_sse2: 11787.1
yuv2gbrap10le_full_X_4_512_sse4: 6282.1
yuv2gbrap10le_full_X_4_512_avx2: 3441.6
yuv2gbrp12be_full_X_4_512_c: 13723.6
yuv2gbrp12be_full_X_4_512_sse2: 9128.1
yuv2gbrp12be_full_X_4_512_sse4: 7997.6
yuv2gbrp12be_full_X_4_512_avx2: 2844.1
yuv2gbrp12le_full_X_4_512_c: 12257.1
yuv2gbrp12le_full_X_4_512_sse2: 9107.6
yuv2gbrp12le_full_X_4_512_sse4: 5142.6
yuv2gbrp12le_full_X_4_512_avx2: 2837.6
yuv2gbrap12be_full_X_4_512_c: 18511.1
yuv2gbrap12be_full_X_4_512_sse2: 12156.6
yuv2gbrap12be_full_X_4_512_sse4: 6251.1
yuv2gbrap12be_full_X_4_512_avx2: 3444.6
yuv2gbrap12le_full_X_4_512_c: 16687.1
yuv2gbrap12le_full_X_4_512_sse2: 11785.1
yuv2gbrap12le_full_X_4_512_sse4: 6243.6
yuv2gbrap12le_full_X_4_512_avx2: 3446.1
yuv2gbrp14be_full_X_4_512_c: 13690.6
yuv2gbrp14be_full_X_4_512_sse2: 9120.6
yuv2gbrp14be_full_X_4_512_sse4: 5138.1
yuv2gbrp14be_full_X_4_512_avx2: 2843.1
yuv2gbrp14le_full_X_4_512_c: 14995.6
yuv2gbrp14le_full_X_4_512_sse2: 9119.1
yuv2gbrp14le_full_X_4_512_sse4: 5126.1
yuv2gbrp14le_full_X_4_512_avx2: 2843.1
yuv2gbrp16be_full_X_4_512_c: 12367.1
yuv2gbrp16be_full_X_4_512_sse2: 8233.6
yuv2gbrp16be_full_X_4_512_sse4: 4820.1
yuv2gbrp16be_full_X_4_512_avx2: 2666.6
yuv2gbrp16le_full_X_4_512_c: 10904.1
yuv2gbrp16le_full_X_4_512_sse2: 8214.1
yuv2gbrp16le_full_X_4_512_sse4: 4824.1
yuv2gbrp16le_full_X_4_512_avx2: 2629.1
yuv2gbrap16be_full_X_4_512_c: 26569.6
yuv2gbrap16be_full_X_4_512_sse2: 10884.1
yuv2gbrap16be_full_X_4_512_sse4: 5488.1
yuv2gbrap16be_full_X_4_512_avx2: 3272.1
yuv2gbrap16le_full_X_4_512_c: 14010.1
yuv2gbrap16le_full_X_4_512_sse2: 10562.1
yuv2gbrap16le_full_X_4_512_sse4: 5463.6
yuv2gbrap16le_full_X_4_512_avx2: 3255.1
yuv2gbrpf32be_full_X_4_512_c: 14524.1
yuv2gbrpf32be_full_X_4_512_sse2: 8552.6
yuv2gbrpf32be_full_X_4_512_sse4: 4636.1
yuv2gbrpf32be_full_X_4_512_avx2: 2474.6
yuv2gbrpf32le_full_X_4_512_c: 13060.6
yuv2gbrpf32le_full_X_4_512_sse2: 9682.6
yuv2gbrpf32le_full_X_4_512_sse4: 4298.1
yuv2gbrpf32le_full_X_4_512_avx2: 2453.1
yuv2gbrapf32be_full_X_4_512_c: 18629.6
yuv2gbrapf32be_full_X_4_512_sse2: 11363.1
yuv2gbrapf32be_full_X_4_512_sse4: 15201.6
yuv2gbrapf32be_full_X_4_512_avx2: 3727.1
yuv2gbrapf32le_full_X_4_512_c: 16677.6
yuv2gbrapf32le_full_X_4_512_sse2: 10221.6
yuv2gbrapf32le_full_X_4_512_sse4: 5693.6
yuv2gbrapf32le_full_X_4_512_avx2: 3656.6
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com> 
						
						
					 
					
						2022-01-11 16:33:17 -03:00 
						 
				 
			
				
					
						
							
							
								rcombs 
							
						 
					 
					
						
						
						
						
							
						
						
							3e00b9e395 
							
						 
					 
					
						
						
							
							swscale/x86/init: use isSemiPlanarYUV  
						
						... 
						
						
						
						Fixes P210/P410 cases introduced (and broken) in 88d804b7ffa20caab2e8e2809da974c41f7fd8fc 
						
						
					 
					
						2021-12-23 01:41:03 -06:00 
						 
				 
			
				
					
						
							
							
								Alan Kelly 
							
						 
					 
					
						
						
						
						
							
						
						
							eebe406c80 
							
						 
					 
					
						
						
							
							libswscale: Test AV_CPU_FLAG_SLOW_GATHER for hscale functions.  
						
						... 
						
						
						
						This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions
are only used where they are faster. 
						
						
					 
					
						2021-12-21 17:44:53 -03:00 
						 
				 
			
				
					
						
							
							
								Alan Kelly 
							
						 
					 
					
						
						
						
						
							
						
						
							86663963e6 
							
						 
					 
					
						
						
							
							x86/swscale: fix minor coding style issues  
						
						... 
						
						
						
						Signed-off-by: James Almer <jamrial@gmail.com> 
						
						
					 
					
						2021-12-16 13:16:04 -03:00 
						 
				 
			
				
					
						
							
							
								Alan Kelly 
							
						 
					 
					
						
						
						
						
							
						
						
							f900a19fa9 
							
						 
					 
					
						
						
							
							libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.  
						
						... 
						
						
						
						Fixes so that fate under 64 bit Windows passes.
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
Signed-off-by: James Almer <jamrial@gmail.com> 
						
						
					 
					
						2021-12-15 20:04:59 -03:00 
						 
				 
			
				
					
						
							
							
								Alan Kelly 
							
						 
					 
					
						
						
						
						
							
						
						
							dc57762cb4 
							
						 
					 
					
						
						
							
							libswscale/x86/swscale: Only call ff_yuv2yuvX functions if the input size is > 0  
						
						... 
						
						
						
						Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> 
						
						
					 
					
						2021-04-01 20:47:52 +02:00 
						 
				 
			
				
					
						
							
							
								Andreas Rheinhardt 
							
						 
					 
					
						
						
						
						
							
						
						
							c23a5523b5 
							
						 
					 
					
						
						
							
							swscale/x86/swscale: Remove unused ASM constants  
						
						... 
						
						
						
						The last user of g15Mask, r15Mask, g16Mask and r16Mask was disabled
in 77a416e8aab77058b542030870fd7178b62d2a62 and finally removed in
36e8de07ed62609df45d064b56501e3084d25723; b15Mask and b16Mask were
apparently always unused (except for in_asm_used_var_warning_killer,
a function that only existed to make the compiler not optimize ASM
constants away).
w10 is unused since d604bab901f6dfaaad672ef2164e42b1f350474c, w02
since ef423a661818f3c0d8206a2abbc65ff555cc0c67.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> 
						
						
					 
					
						2021-02-24 09:47:54 +01:00 
						 
				 
			
				
					
						
							
							
								James Almer 
							
						 
					 
					
						
						
						
						
							
						
						
							c00567647e 
							
						 
					 
					
						
						
							
							swscale/x86/swscale: fix mix of inline and external function definitions  
						
						... 
						
						
						
						This includes removing pointless static function forward declarations.
Signed-off-by: James Almer <jamrial@gmail.com> 
						
						
					 
					
						2021-02-18 18:47:42 -03:00 
						 
				 
			
				
					
						
							
							
								James Almer 
							
						 
					 
					
						
						
						
						
							
						
						
							c2bf1dcace 
							
						 
					 
					
						
						
							
							swscale/x86/swscale: fix compilation with old yasm  
						
						... 
						
						
						
						Where AVX2 may not be supported.
Signed-off-by: James Almer <jamrial@gmail.com> 
						
						
					 
					
						2021-02-17 21:09:36 -03:00 
						 
				 
			
				
					
						
							
							
								Alan Kelly 
							
						 
					 
					
						
						
						
						
							
						
						
							554c2bc708 
							
						 
					 
					
						
						
							
							swscale: move yuv2yuvX_sse3 to yasm, unrolls main loop  
						
						... 
						
						
						
						And other small optimizations for ~20% speedup. 
						
						
					 
					
						2021-02-17 21:21:03 +01:00 
						 
				 
			
				
					
						
							
							
								Anton Khirnov 
							
						 
					 
					
						
						
						
						
							
						
						
							e15371061d 
							
						 
					 
					
						
						
							
							lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bump  
						
						... 
						
						
						
						They are not properly namespaced and not intended for public use. 
						
						
					 
					
						2021-01-01 14:14:57 +01:00 
						 
				 
			
				
					
						
							
							
								Nelson Gomez 
							
						 
					 
					
						
						
						
						
							
						
						
							bc01337db4 
							
						 
					 
					
						
						
							
							swscale/x86/output: add AVX2 version of yuv2nv12cX  
						
						... 
						
						
						
						256 bits is just wide enough to fit all the operands needed to vectorize
the software implementation, but AVX2 is needed to for a couple of
instructions like cross-lane permutation.
Output is bit-for-bit identical to C.
Signed-off-by: Nelson Gomez <nelson.gomez@microsoft.com> 
						
						
					 
					
						2020-06-14 16:34:07 +01:00 
						 
				 
			
				
					
						
							
							
								Ruiling Song 
							
						 
					 
					
						
						
						
						
							
						
						
							4700f7d6fc 
							
						 
					 
					
						
						
							
							swscale/swscale: remove useless code  
						
						... 
						
						
						
						Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> 
						
						
					 
					
						2020-04-03 00:58:07 +02:00 
						 
				 
			
				
					
						
							
							
								Ting Fu 
							
						 
					 
					
						
						
						
						
							
						
						
							e934194b6a 
							
						 
					 
					
						
						
							
							libswscale/x86/yuv2rgb: Change inline assembly into nasm code  
						
						... 
						
						
						
						The original inline assembly and nasm code have the same fps when called by command.
NASM code almost has no impact on the perfromance.
Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> 
						
						
					 
					
						2020-02-05 17:41:59 +01:00 
						 
				 
			
				
					
						
							
							
								Andreas Rheinhardt 
							
						 
					 
					
						
						
						
						
							
						
						
							736c7c20e7 
							
						 
					 
					
						
						
							
							swscale/x86/swscale: Fix undefined left shifts of negative numbers  
						
						... 
						
						
						
						This affected many FATE-tests: The number of failing tests went down
from 663 to 344. (Both numbers exclude tests that failed because of
unaligned accesses in code that is inside #if HAVE_FAST_UNALIGNED.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> 
						
						
					 
					
						2019-09-28 17:24:32 +02:00 
						 
				 
			
				
					
						
							
							
								Thomas Köppe 
							
						 
					 
					
						
						
						
						
							
						
						
							43171a2a73 
							
						 
					 
					
						
						
							
							Fix missing used attribute for inline assembly variables  
						
						... 
						
						
						
						Variables used in inline assembly need to be marked with attribute((used)).
Static constants already were, via the define of DECLARE_ASM_CONST.
But DECLARE_ALIGNED does not add this attribute, and some of the variables
defined with it are const only used in inline assembly, and therefore
appeared dead. This change adds a macro DECLARE_ASM_ALIGNED that marks
variables as used.
This change makes FFMPEG work with Clang's ThinLTO.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> 
						
						
					 
					
						2017-11-13 03:58:34 +01:00 
						 
				 
			
				
					
						
							
							
								Timo Rothenpieler 
							
						 
					 
					
						
						
						
						
							
						
						
							99882d05a6 
							
						 
					 
					
						
						
							
							swscale: add support for P010LE/BE output  
						
						
						
						
					 
					
						2016-08-31 13:19:46 +02:00 
						 
				 
			
				
					
						
							
							
								Matthieu Bouron 
							
						 
					 
					
						
						
						
						
							
						
						
							9eb3da2f99 
							
						 
					 
					
						
						
							
							asm: FF_-prefix internal macros used in inline assembly  
						
						... 
						
						
						
						See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'. 
						
						
					 
					
						2016-06-27 17:21:18 +02:00 
						 
				 
			
				
					
						
							
							
								Hendrik Leppkes 
							
						 
					 
					
						
						
						
						
							
						
						
							c142dc203e 
							
						 
					 
					
						
						
							
							Merge commit 'dc40a70c5755bccfb1a1349639943e1f408bea50'  
						
						... 
						
						
						
						* commit 'dc40a70c5755bccfb1a1349639943e1f408bea50':
  Drop unnecessary libavutil/x86/asm.h #includes
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com> 
						
						
					 
					
						2016-06-26 15:53:00 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Biurrun 
							
						 
					 
					
						
						
						
						
							
						
						
							dc40a70c57 
							
						 
					 
					
						
						
							
							Drop unnecessary libavutil/x86/asm.h #includes  
						
						
						
						
					 
					
						2016-05-28 19:18:26 +02:00 
						 
				 
			
				
					
						
							
							
								Pedro Arthur 
							
						 
					 
					
						
						
						
						
							
						
						
							6de58b4903 
							
						 
					 
					
						
						
							
							swscale: cleanup unused code  
						
						... 
						
						
						
						Removed previous swscale code under '#ifndef NEW_FILTER'
and removed unused fields of SwsContext 
						
						
					 
					
						2016-03-31 16:36:16 -03:00 
						 
				 
			
				
					
						
							
							
								Hendrik Leppkes 
							
						 
					 
					
						
						
						
						
							
						
						
							5d8e836d0e 
							
						 
					 
					
						
						
							
							Replace all remaining occurances of step/depth_minus1 and offset_plus1  
						
						
						
						
					 
					
						2015-09-08 17:10:48 +02:00 
						 
				 
			
				
					
						
							
							
								Pedro Arthur 
							
						 
					 
					
						
						
						
						
							
						
						
							62d176de12 
							
						 
					 
					
						
						
							
							swscale: refactor vertical scaler  
						
						
						
						
					 
					
						2015-08-19 10:43:52 -03:00 
						 
				 
			
				
					
						
							
							
								Pedro Arthur 
							
						 
					 
					
						
						
						
						
							
						
						
							ed80dec621 
							
						 
					 
					
						
						
							
							swscale: fixed compiler warnings  
						
						... 
						
						
						
						Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> 
						
						
					 
					
						2015-08-18 22:56:50 +02:00 
						 
				 
			
				
					
						
							
							
								Pedro Arthur 
							
						 
					 
					
						
						
						
						
							
						
						
							e0a3173a94 
							
						 
					 
					
						
						
							
							swscale: refactor horizontal scaling  
						
						... 
						
						
						
						+ split color conversion from scaling
- disabled gamma correction, until it's refactored too
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> 
						
						
					 
					
						2015-08-18 01:33:32 +02:00 
						 
				 
			
				
					
						
							
							
								Michael Niedermayer 
							
						 
					 
					
						
						
						
						
							
						
						
							54e64eaf68 
							
						 
					 
					
						
						
							
							swscale/x86/swscale: Fix warning about loosing significant bits in cast  
						
						... 
						
						
						
						Signed-off-by: Michael Niedermayer <michaelni@gmx.at> 
						
						
					 
					
						2015-05-10 15:09:04 +02:00 
						 
				 
			
				
					
						
							
							
								Michael Niedermayer 
							
						 
					 
					
						
						
						
						
							
						
						
							ae20682f6b 
							
						 
					 
					
						
						
							
							swscale: Add prefix to updateMMXDitherTables()  
						
						... 
						
						
						
						Signed-off-by: Michael Niedermayer <michaelni@gmx.at> 
						
						
					 
					
						2015-02-27 19:23:46 +01:00 
						 
				 
			
				
					
						
							
							
								Kieran Kunhya 
							
						 
					 
					
						
						
						
						
							
						
						
							b546023b93 
							
						 
					 
					
						
						
							
							swscale: fix yuv2yuvX_8 assembly on x86  
						
						... 
						
						
						
						use_mmx_vfilter check/fix by commiter
Signed-off-by: Michael Niedermayer <michaelni@gmx.at> 
						
						
					 
					
						2014-11-09 23:52:17 +01:00 
						 
				 
			
				
					
						
							
							
								Vitor Sessak 
							
						 
					 
					
						
						
						
						
							
						
						
							55d11d277b 
							
						 
					 
					
						
						
							
							swscale/x86: do not expect registers to be preserved across inline ASM blocks  
						
						... 
						
						
						
						Signed-off-by: Michael Niedermayer <michaelni@gmx.at> 
						
						
					 
					
						2014-09-18 00:03:29 +02:00 
						 
				 
			
				
					
						
							
							
								Michael Niedermayer 
							
						 
					 
					
						
						
						
						
							
						
						
							4c8bc6fdee 
							
						 
					 
					
						
						
							
							Merge commit 'e96c3b81cadd0ba84d43b1f3a54980df3785d9a5'  
						
						... 
						
						
						
						* commit 'e96c3b81cadd0ba84d43b1f3a54980df3785d9a5':
  avutil: rename AV_PIX_FMT_Y400A to AV_PIX_FMT_YA8
Conflicts:
	libavcodec/libopenjpegdec.c
	libavcodec/libopenjpegenc.c
	libavcodec/raw.c
	libavutil/pixdesc.c
	libavutil/pixfmt.h
	libavutil/version.h
	libswscale/swscale_internal.h
	libswscale/swscale_unscaled.c
Merged-by: Michael Niedermayer <michaelni@gmx.at> 
						
						
					 
					
						2014-08-04 21:48:00 +02:00 
						 
				 
			
				
					
						
							
							
								Vittorio Giovara 
							
						 
					 
					
						
						
						
						
							
						
						
							e96c3b81ca 
							
						 
					 
					
						
						
							
							avutil: rename AV_PIX_FMT_Y400A to AV_PIX_FMT_YA8  
						
						... 
						
						
						
						The rationale is that you have a packed format in form
<greyscale sample> <alpha sample> <greyscale sample> <alpha sample>
and shortening greyscale to 'G' might make one thing about Greenscale instead.
An alias pixel format and color space name are provided for compatibility. 
						
						
					 
					
						2014-08-04 12:55:08 +01:00 
						 
				 
			
				
					
						
							
							
								Carl Eugen Hoyos 
							
						 
					 
					
						
						
						
						
							
						
						
							891307b4d1 
							
						 
					 
					
						
						
							
							s86/scale: Do not return the result of a (void) function from a void function.  
						
						... 
						
						
						
						Fixes compilation with Sun C 5.12.
Reported by Bradley Mitchell in ticket #3649 . 
						
						
					 
					
						2014-06-19 18:45:13 +02:00 
						 
				 
			
				
					
						
							
							
								Michael Niedermayer 
							
						 
					 
					
						
						
						
						
							
						
						
							2f955d572b 
							
						 
					 
					
						
						
							
							swscale/x86/swscale: remove unused constants  
						
						... 
						
						
						
						Signed-off-by: Michael Niedermayer <michaelni@gmx.at> 
						
						
					 
					
						2014-03-17 00:06:45 +01:00 
						 
				 
			
				
					
						
							
							
								Michael Niedermayer 
							
						 
					 
					
						
						
						
						
							
						
						
							6c47a4e972 
							
						 
					 
					
						
						
							
							swscale/x86/swscale: fix missing xmm clobbers in yuv2yuvX_sse3()  
						
						... 
						
						
						
						Signed-off-by: Michael Niedermayer <michaelni@gmx.at> 
						
						
					 
					
						2014-03-15 22:52:22 +01:00 
						 
				 
			
				
					
						
							
							
								Michael Niedermayer 
							
						 
					 
					
						
						
						
						
							
						
						
							b148a39d55 
							
						 
					 
					
						
						
							
							Merge commit '46bacb5cc6169ff5e8e982495c4925467c1d8bb7'  
						
						... 
						
						
						
						* commit '46bacb5cc6169ff5e8e982495c4925467c1d8bb7':
  x86: Consistently use cpu flag detection macros in places that still miss it
Merged-by: Michael Niedermayer <michaelni@gmx.at> 
						
						
					 
					
						2014-01-14 14:44:59 +01:00 
						 
				 
			
				
					
						
							
							
								Diego Biurrun 
							
						 
					 
					
						
						
						
						
							
						
						
							46bacb5cc6 
							
						 
					 
					
						
						
							
							x86: Consistently use cpu flag detection macros in places that still miss it  
						
						
						
						
					 
					
						2014-01-14 00:04:58 +01:00 
						 
				 
			
				
					
						
							
							
								Michael Niedermayer 
							
						 
					 
					
						
						
						
						
							
						
						
							8733b363ac 
							
						 
					 
					
						
						
							
							Merge commit 'c16bfb147df8a9d350e8a0dbc01937b78faf5949'  
						
						... 
						
						
						
						* commit 'c16bfb147df8a9d350e8a0dbc01937b78faf5949':
  swscale: x86: Consistently use lowercase function name suffixes
Conflicts:
	libswscale/x86/rgb2rgb.c
	libswscale/x86/swscale.c
See: 1de064e21e7f1bbdd2347ba8967089a18669fcf8
Merged-by: Michael Niedermayer <michaelni@gmx.at> 
						
						
					 
					
						2013-11-23 12:10:40 +01:00 
						 
				 
			
				
					
						
							
							
								Diego Biurrun 
							
						 
					 
					
						
						
						
						
							
						
						
							c16bfb147d 
							
						 
					 
					
						
						
							
							swscale: x86: Consistently use lowercase function name suffixes  
						
						
						
						
					 
					
						2013-11-22 23:01:51 +01:00 
						 
				 
			
				
					
						
							
							
								Michael Niedermayer 
							
						 
					 
					
						
						
						
						
							
						
						
							db6b389c7f 
							
						 
					 
					
						
						
							
							Merge commit 'a519583991c38d38503ab08357716513facc5725'  
						
						... 
						
						
						
						* commit 'a519583991c38d38503ab08357716513facc5725':
  swscale: x86: Hide arch-specific initialization details
Conflicts:
	libswscale/x86/Makefile
	libswscale/x86/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at> 
						
						
					 
					
						2013-08-29 14:42:34 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Biurrun 
							
						 
					 
					
						
						
						
						
							
						
						
							a519583991 
							
						 
					 
					
						
						
							
							swscale: x86: Hide arch-specific initialization details  
						
						... 
						
						
						
						Also give consistent names to init functions. 
						
						
					 
					
						2013-08-28 23:59:24 +02:00 
						 
				 
			
				
					
						
							
							
								Michael Niedermayer 
							
						 
					 
					
						
						
						
						
							
						
						
							920dd84bf1 
							
						 
					 
					
						
						
							
							sws/x86: remove 8bit rgb2yuv coefficient case for rgb24toyv12 special converter  
						
						... 
						
						
						
						This simplifies the code and improves quality at the expense of a slight
slowdown of a rarely used function (no fate test uses it).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at> 
						
						
					 
					
						2013-04-15 03:19:52 +02:00 
						 
				 
			
				
					
						
							
							
								Michael Niedermayer 
							
						 
					 
					
						
						
						
						
							
						
						
							63a97d5674 
							
						 
					 
					
						
						
							
							Merge commit 'b6649ab5037fb55f78c2606f3d23cea0867cdeaa'  
						
						... 
						
						
						
						* commit 'b6649ab5037fb55f78c2606f3d23cea0867cdeaa':
  cosmetics: Remove unnecessary extern keywords from function declarations
Conflicts:
	libswscale/x86/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at> 
						
						
					 
					
						2013-03-28 11:20:41 +01:00