On modern x86 systems its around 2x faster. For systems without
FPUs it'll be slower, but our policy is to prefer floating point
implementations and to let users decide what's best (or just not
compile them on systems without FPUs).
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>