Nathan E. Egge
8280ec7a32
lavu/riscv: Revert d808070, removing AV_READ_TIME
...
The implementation of ff_read_time() for RISC-V uses rdtime which has
precision on existing hardware too low (!) for benchmarking purposes.
Deleting this implementation falls back on clock_gettime() which was
added as the default ff_read_time() implementation in 33e4cc9.
Below are metrics gathered on SpacemiT K1, before and after this commit:
Before:
$ tests/checkasm/checkasm --bench
benchmarking with native FFmpeg timers
nop: 0.0
checkasm: using random seed 3473665261
checkasm: bench runs 1024 (1 << 10)
RVI:
- pixblockdsp.get_pixels [OK]
- vc1dsp.mspel_pixels [OK]
RVF:
- audiodsp.audiodsp [OK]
checkasm: all 4 tests passed
audiodsp.vector_clipf_c: 1388.7
audiodsp.vector_clipf_rvf: 261.5
get_pixels_c: 2.0
get_pixels_rvi: 1.5
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 8.0
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi: 1.0
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 2.0
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi: 0.5
After:
$ tests/checkasm/checkasm --bench
benchmarking with native FFmpeg timers
nop: 56.4
checkasm: using random seed 1021411603
checkasm: bench runs 1024 (1 << 10)
RVI:
- pixblockdsp.get_pixels [OK]
- vc1dsp.mspel_pixels [OK]
RVF:
- audiodsp.audiodsp [OK]
checkasm: all 4 tests passed
audiodsp.vector_clipf_c: 23236.4
audiodsp.vector_clipf_rvf: 11038.4
get_pixels_c: 79.6
get_pixels_rvi: 48.4
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 329.6
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi: 38.1
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 89.9
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi: 17.1
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-07-31 17:48:50 +03:00
James Almer
ab5c612137
avcodec/Makefile: use the correct path for aacdec_fixed.o when setting its dependencies
...
Fixes ticket #11112
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-31 11:32:56 -03:00
Anton Khirnov
43f702a253
lavfi/framesync: avoid forcing frame writability unnecessarily
...
Callers of ff_framesync_get_frame() generally do not expect the result
to be writable, those that do (e.g. ff_framesync_dualinput_get_writable())
ensure writability themselves.
Significantly reduces memory consumption in complex graphs with
framesync-based filters (e.g. scale, ssim).
Reported-By: Mark Shwartzman
2024-07-31 11:12:45 +02:00
Rémi Denis-Courmont
262168b04e
lavc/videodsp: RISC-V zicbop prefetch
...
There are currently no ways to run-time detect the CPU capability, so we
take it for granted (in the worst case, it will execute NOPs).
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
4570b9f3c4
configure: check if assembler supports RV zicbop
...
zicbop is the Cache Block Operation, Prefetch extension to RVI.
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
324eba69f7
lavc/vc1dsp: use saturating arithmetic for RVV inv_trans_dc
...
T-Head C908 (cycles):
vc1dsp.vc1_inv_trans_4x4_dc_c: 113.7
vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 46.5 (before)
vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 45.5 (after)
vc1dsp.vc1_inv_trans_4x8_dc_c: 230.7
vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 65.7 (before)
vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 52.5 (after)
vc1dsp.vc1_inv_trans_8x4_dc_c: 246.7
vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 56.7 (before)
vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 45.5 (after)
vc1dsp.vc1_inv_trans_8x8_dc_c: 419.7
vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 81.2 (before)
vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 53.5 (after)
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
784a72a116
lavc/vc1dsp: unify R-V V DC bypass functions
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
bd0c3edb13
lavu/riscv: count bytes rather than words for bswap32
...
This removes the dependency on Zba at essentially zero cost.
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
5171baa228
lavc/ac3dsp: fix R-V CPU requirements
...
It probably will not matter on any real hardware, but the Zbb optimisations
do not require Zba. And then, we need HAVE_RVV to build the RVV stuff.
2024-07-30 18:41:51 +03:00
Peter Ross
0e09f6d690
avcodec/adpcm: only process right samples when decoding stereo
...
Fixes Coverity issue #1610760 .
2024-07-30 19:55:31 +10:00
Leo Izen
7bb5626fa7
avcodec/pngenc: fix sBIT writing for indexed-color PNGs
...
We currently write invalid sBIT entries for indexed PNGs, which by PNG
specification[1] must be 3-bytes long. The values also are capped at 8
for indexed-color PNGs, not the palette depth. This patch fixes both of
these issues previously fixed in the decoder, but not the encoder.
[1]: https://www.w3.org/TR/png-3/#11sBIT
Regression since: c125860892e931d9b10f88ace73c91484815c3a8.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Reported-by: Ramiro Polla: <ramiro.polla@gmail.com>
2024-07-30 05:43:36 -04:00
Leo Izen
825606641b
avcodec/pngdec: use 8-bit sBIT cap for indexed PNGs per spec
...
The PNG specification[1] says that sBIT entries must be at most the bit
depth specified in IHDR, unless the PNG is indexed-color, in which case
sBIT must be between 1 and 8. We should not reject valid sBITs on PNGs
with indexed color.
[1]: https://www.w3.org/TR/png-3/#11sBIT
Regression since 84b454935fae2633a8a5dd075e22393f3e8f932f.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Reported-by: Ramiro Polla <ramiro.polla@gmail.com>
2024-07-30 05:43:31 -04:00
Marth64
e2105b2800
avcodec/aacenc: Correct option description for aac_coder fast
...
The description advertises fast as "Default fast search", but
this has not been the default for a long time (current default
is twoloop).
Signed-off-by: Marth64 <marth64@proxyid.net>
2024-07-30 05:42:50 -04:00
Fei Wang
79b4869959
lavu/hwcontext_qsv: Derive bind flag from frame type if no valid surface
...
Fix cmd:
ffmpeg.exe -init_hw_device d3d11va=d3d -init_hw_device qsv=qsv@d3d \
-filter_hw_device d3d -hwaccel qsv -hwaccel_output_format qsv \
-i in.h264 -vf "hwmap,format=d3d11,hwdownload,format=nv12" -y out.yuv
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Tested-by: Tong Wu <wutong1208@outlook.com>
2024-07-30 13:41:15 +08:00
Fei Wang
d30a9fdc80
lavc/qsvdec: Add VVC decoder
...
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-07-30 13:40:21 +08:00
Fei Wang
cf9c398fc1
configure: Alphabetical order for av1 codecs
...
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-07-30 13:32:44 +08:00
James Almer
9e7a93c6fd
x86/intreadwrite: add SSE2 optimized AV_COPY128U
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-29 23:17:52 -03:00
James Almer
92b317245c
avformat/mov: use AV_WL*A
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-29 21:33:31 -03:00
James Almer
f1fcc3ca5f
avformat/matroskadec: use AV_WL32A
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-29 21:33:31 -03:00
James Almer
09de979ff6
avcodec/amfenc_av1: use AV_WL32A
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-29 21:33:31 -03:00
James Almer
753f2aeed7
avutil/intreadwrite: add missing aligned read/write macros
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-29 21:33:31 -03:00
Rémi Denis-Courmont
7b24f96c87
lavc/vp9dsp: remove R-V I intra functions
...
At this point, they are identical to the C code, except for instruction
ordering. In fact, they are typically slower or no faster than the C code.
2024-07-29 21:16:41 +03:00
Rémi Denis-Courmont
7aa6510fe1
lavc/vp9dsp: copy 8 pixels at once
...
In the 8-bit case, we can actually read/write 8 aligned pixel values per
load/store, which unsurprisingly tends to be faster on 64-bit systems (and
makes no differences on 32-bit systems). This requires ifdef'ing though.
2024-07-29 21:16:41 +03:00
Rémi Denis-Courmont
c98127c00e
lavc/vp9dsp: use restrict qualifier for copy/avg MC
...
Same as previous commit.
2024-07-29 21:16:41 +03:00
Rémi Denis-Courmont
56fc5fc6ce
lavc/vp9dsp: restrict vertical intra pointers
...
This lets the compiler unroll ever so slightly better (at least in the
16x16 case for RISC-V GCC).
2024-07-29 21:16:41 +03:00
James Almer
afb06aef7e
avcodec/decode: remove unused argument from ff_frame_new_side_data_from_buf()
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-29 14:00:48 -03:00
James Almer
e7d3ff8dcd
avformat/mov: check that child boxes of trak are only present inside it
...
Based on the check done for the stco box.
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-28 17:28:19 -03:00
James Almer
2aa63784b5
avformat/mov: check that sample and chunk count is 1 for HEIF
...
Fixes NULL pointer dereference in broken/fuzzed streams.
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-28 17:28:19 -03:00
Rémi Denis-Courmont
39ced529b0
lavu/riscv: implement floating point clips
...
Unlike x86, fmin/fmax are single instructions, not function calls. They
are much much faster than doing a comparison, then branching based on its
results. With this, audiodsp.vector_clipf gets almost twice as fast, and
a properly unrollled version of it gets 4-5x faster, on SiFive-U74.
This is only the low-hanging fruit: FFMIN and FFMAX are presumably
affected as well.
This likely applies to other instruction sets with native IEEE floats,
especially those lacking a conditional select instruction.
2024-07-28 21:24:58 +03:00
Rémi Denis-Courmont
b0b3bea10b
lavc/h264dsp: use saturing add/sub for R-V V 8-bit DC add
...
T-Head C908 (cycles):
h264_idct4_dc_add_8bpp_c: 109.2
h264_idct4_dc_add_8bpp_rvv_i32: 34.5 (before)
h264_idct4_dc_add_8bpp_rvv_i32: 25.5 (after)
h264_idct8_dc_add_8bpp_c: 418.7
h264_idct8_dc_add_8bpp_rvv_i64: 69.5 (before)
h264_idct8_dc_add_8bpp_rvv_i64: 33.5 (after)
2024-07-28 21:24:12 +03:00
Shiyou Yin
4713a5cc24
swscale: [loongarch] Fix checkasm-sw_yuv2rgb failure.
...
Reviewed-by: 陈昊 <chenhao@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-28 19:02:16 +02:00
Michael Niedermayer
3b9c6c7fbb
avcodec/adpcm: Remove setting min_channel to value it is already set to
...
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-28 19:02:15 +02:00
Tong Wu
b1d410716b
lavc/d3d12va_encode: trim header alignment at output
...
It is d3d12va's requirement that the FrameStartOffset must be aligned as
per hardware limitation. However, we could trim this alignment at output
to reduce coded size. A aligned_header_size is added to
D3D12VAEncodePicture.
Signed-off-by: Tong Wu <wutong1208@outlook.com>
2024-07-28 17:50:30 +02:00
Rémi Denis-Courmont
9b4655c3a1
lavc/vp8dsp: use saturating add/sub for R-V V DC add
...
T-Head C908 (cycles):
vp7_idct_dc_add_c: 108.5
vp7_idct_dc_add_rvv_i32: 56.2 (before)
vp7_idct_dc_add_rvv_i32: 47.2 (after)
vp8_idct_dc_add_c: 96.2
vp8_idct_dc_add_rvv_i32: 43.0 (before)
vp8_idct_dc_add_rvv_i32: 34.0 (after)
2024-07-28 17:37:21 +03:00
Rémi Denis-Courmont
bbfc0ac9ca
lavc/riscv: don't set vxrm if unnecessary
...
While narrowing clip is nominally a rounding operation, the rounding mode
has no arithmetic consequence if the right shift is by zero bits.
2024-07-28 17:37:21 +03:00
Niklas Haas
e42a0763b7
avcodec/dovi_rpudec: clarify semantics
...
ff_dovi_rpu_parse() and ff_dovi_rpu_generate() are a bit inconsistent in
that they expect different levels of encapsulation, due to the nature of
how this is handled in the context of different APIs. Clarify the status
quo. (And fix an incorrect reference to the RPU payload bytes as 'RBSP')
2024-07-28 12:20:07 +02:00
Niklas Haas
6b66df74b8
avcodec/dovi_rpu: correctly copy num_ext_blocks
2024-07-28 12:20:07 +02:00
Niklas Haas
b5aeafc00a
fftools/ffprobe: implement dv_md_compression
2024-07-28 12:20:07 +02:00
Niklas Haas
3d5d60d041
avformat/dump: implement dv_md_compression
2024-07-28 12:20:07 +02:00
Niklas Haas
ce8166a19c
avformat/mpegts: implement dv_md_compression
2024-07-28 12:20:07 +02:00
Niklas Haas
b3a9fab9da
avformat/dovi_isom: implement dv_md_compression
2024-07-28 12:20:07 +02:00
Niklas Haas
cbea92c84d
avutil/dovi_meta: add dv_md_compression to cfg record
...
This field is used to signal the compression method in use.
2024-07-28 12:20:07 +02:00
Zhao Zhili
719e46f54c
avcodec/videotoolboxenc: Fix variable type of AV_OPT_TYPE_BOOL
...
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-26 19:54:56 +08:00
Zhao Zhili
f4e0f40230
avcodec/videotoolboxenc: Set default bitrate to zero
...
Zero is auto mode. From the doc of videotoolbox:
The default bit rate is zero, which indicates that the video
encoder should determine the size of compressed data.
Before the patch, the default bitrate is 200000 setting by
avcodec/options_table, which doesn't work for most of cases.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-26 19:54:20 +08:00
Zhao Zhili
d07da7539d
avcodec/videotoolboxenc: Fix bitrate doesn't work as expected
...
Commit 4ef5e7d4722 add qmin/qmax support to videotoolbox encoder.
The default value of (qmin, qmax) is (2, 31), which makes bitrate
control doesn't work as users' expectations.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-26 19:54:20 +08:00
Rémi Denis-Courmont
8030876d1c
checkasm/riscv: align the landing pads
2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
7dde8be29f
checkasm/riscv: add forward-edge CFI landing pads
2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
4f2472909e
sws/riscv: add forward-edge CFI landing pads
2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
b5c111272b
lavfi/riscv: add forward-edge CFI landing pads
2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
f2c30fe15a
lavc/riscv: add forward-edge CFI landing pads
2024-07-25 23:10:14 +03:00