Rémi Denis-Courmont
29b9d616c2
lavu/float_dsp: rework RISC-V V scalar product
...
1) Take the reductive sum out of the loop,
leaving a regular vector addition in the loop.
2) Merge the addition and the multiplication.
3) Unroll.
Before:
scalarproduct_float_rvv_f32: 832.5
After:
scalarproduct_float_rvv_f32: 275.2
2023-07-20 22:54:34 +03:00
Rémi Denis-Courmont
b710f881ce
lavu/float_dsp: unroll RISC-V V loops
...
butterflies_float_c: 1057.0
butterflies_float_rvv_f32: 351.0 (before)
butterflies_float_rvv_f32: 329.5 (after)
vector_dmac_scalar_c: 819.0
vector_dmac_scalar_rvv_f64: 670.5 (before)
vector_dmac_scalar_rvv_f64: 431.0 (after)
vector_dmul_c: 800.2
vector_dmul_rvv_f64: 541.5 (before)
vector_dmul_rvv_f64: 426.0 (after)
vector_dmul_scalar_c: 545.7
vector_dmul_scalar_rvv_f64: 670.7 (before)
vector_dmul_scalar_rvv_f64: 324.7 (after)
vector_fmac_scalar_c: 804.5
vector_fmac_scalar_rvv_f32: 412.7 (before)
vector_fmac_scalar_rvv_f32: 214.5 (after)
vector_fmul_c: 811.2
vector_fmul_rvv_f32: 285.7 (before)
vector_fmul_rvv_f32: 214.2 (after)
vector_fmul_add_c: 1313.0
vector_fmul_add_rvv_f32: 349.0 (before)
vector_fmul_add_rvv_f32: 290.2 (after)
vector_fmul_reverse_c: 815.7
vector_fmul_reverse_rvv_f32: 529.2 (before)
vector_fmul_reverse_rvv_f32: 515.7 (after)
vector_fmul_scalar_c: 546.0
vector_fmul_scalar_rvv_f32: 350.2 (before)
vector_fmul_scalar_rvv_f32: 169.5 (after)
2023-07-20 22:54:34 +03:00
Rémi Denis-Courmont
96a83ceea4
riscv: fix scalar product initialisation
...
VSETVLI xd, x0, ...' has rather nonobvious semantics:
- If xd is x0, then it preserves the current vector length.
- If xd is not x0, it sets the vector length to the supported maximum.
Also somewhat confusingly, while VMV.X.S always does its thing
regardless of the selected vector length, VMV.S.X does _nothing_ if the
selected vector length is zero.
So the current code breaks fails to initialise the accumulator if we
are unlucky to have a selected vector length of zero on entry. Fix it
by forcing the vector length to one.
2022-10-13 10:17:38 +02:00
Rémi Denis-Courmont
3ba5579e55
riscv: remove unnecessary #include's
...
Pointed out by Andreas Rheinhardt.
2022-10-05 06:54:56 +02:00
Rémi Denis-Courmont
cd77662953
lavu/floatdsp: RISC-V V scalarproduct_float
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
b493370662
lavu/floatdsp: RISC-V V vector_fmul_window
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
9aeb6aca3a
lavu/floatdsp: RISC-V V vector_fmul_reverse
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
47ce9735cc
lavu/floatdsp: RISC-V V butterflies_float
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
f4ea45040f
lavu/floatdsp: RISC-V V vector_fmul_add
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
d120ab5b91
lavu/floatdsp: RISC-V V vector_dmac_scalar
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
c3db27ba95
lavu/floatdsp: RISC-V V vector_fmac_scalar
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
da169a210d
lavu/floatdsp: RISC-V V vector_dmul
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
7058af9969
lavu/floatdsp: RISC-V V vector_fmul
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
89b7ec65a8
lavu/floatdsp: RISC-V V vector_dmul_scalar
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
a6c10d05fe
lavu/floatdsp: RISC-V V vector_fmul_scalar
...
This is based on existing code from the VLC git tree with two minor
changes to account for the different function prototypes.
2022-09-27 13:19:52 +02:00