Y, U, V data is loaded at the end of the current iteration for the next
iteration.
It results in memory access past the frame data on the last iteration
(that data is never used after the loading).
So load data at the start of the iteration, so that only useful data is
loaded.
Signed-off-by: Vardan Margaryan <v.t.margaryan@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Tested using this command:
/ffmpeg -pix_fmt yuv420p -s 1920*1080 -i ArashRawYuv420.yuv \
-vcodec rawvideo -s 1920*1080 -pix_fmt rgb24 -f null /dev/null
The fps increase from 389 to 640 on Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
Signed-off-by: Ting Fu <ting.fu@intel.com>
The original inline assembly and nasm code have the same fps when called by command.
NASM code almost has no impact on the perfromance.
Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>