Traditionally computers work with integer types of different sizes. For scientific applications, media, gaming and other applications floating point numbers are needed. In old computers floating point numbers where handled in software, by special libraries, making them much slower than integers, but nowadays most CPUs have an FPU that can make fast float calculations.
Until recently I was under impression that integers were still faster than floats and that floats have precision/rounding issues, making the integer datatype the natural and only sane choice for representing mathematical integers. Then I came to learn two things:
- OpenSSL uses the double datatype instead of int in some situations (big numbers) for performance reasons.
Both these applications exploit the fact that if the cost of 64bit float operations is (thanks to the FPU) roughly equal to the cost of 32bit integer operations, then a double can be a more powerful representation of big integers than an int. It is also important to understand that (double) floating point numbers have precision problems only handling decimal points (ex 0.1) and very big numbers, but handle real world integers just fine.
Apart from this, there could be other possible advantages of using float instead of int:
- If the FPU can execute instructions somewhat in parallell with the ALU/CPU using floats when possible could benefit performance.
- If there are dedicated floating point registers, making use of them could free up integer registers.
Well, I decided to make a test. I have a real world application:
- written in C
- that does calculations on integers (mostly in the range 0-1000000)
- that has automated tests, so I can modify the program and confirm that it still works
- that has built in performance/time measurement
Since I had used int to represent a real-world-measurement (length in mm), I decided nothing is really lost if I use float or double instead of int. The values were small enough that a 32bit float would probably be sufficiently precise (otherwise my automated tests would complain). While the program is rather computation heavy, it is not extremely calculation-intense, and the only mathematical operations I use are +,-,>,=,<. That is, even if float-math was for "free" the program would still be heavy but faster.
In all cases gcc is used with -O2 -ffast-math. The int column shows speed relative to the first line (Celeron 630MHz is my reference/baseline). The float/double columns show speed relative to the int speed of the same machine. Higher is better.
|Eee701 Celeron 630MHz / Lubuntu||1.0||0.93||0.93|
|AMD Athlon II 3Ghz / Xubuntu||5.93||1.02||0.97|
|PowerBook G4 PPC 867MHz / Debian||1.0||0.94||0.93|
|Linksys WDR4900 PPC 800MHz / OpenWRT||1.12||0.96 (0.87)||0.41 (0.89)||Values in parenthesis using -mcpu=8548|
|Raspberry Pi ARMv6 700MHz / Raspbian||0.52||0.94||0.93|
|QNAP TS-109 ARMv5 500MHz / Debian||0.27||0.61||0.52|
|WRT54GL Mips 200MHz / OpenWRT||0.17||0.20||0.17|
A few notes on this:
I have put together quite many measurements and runs to eliminate outliers and variance, to produce the figures above.
There was something strange about the results from the PowerBook G4, and the performance is not what should be expected. I dont know if my machine underperforms, or if there is something wrong with the time measurements. Nevertheless, I believe the int vs float performance is still valid.
The Athlon is much faster than the other machines, giving shorter execution times, and the variance between different runs was bigger than for other machines. The 1.02/0.97 could very well be within error margin of 1.0.
The QNAP TS-109 ARM CPU does not have an FPU, which explains the lower performance for float/double. Other machines displayed similar float/double performance with “-msoft-float”.
The Linksys WDR4900 has an FPU that is capable of both single/double float precision. But with OpenWRT BB RC3 toolchain, gcc defaults to -mcpu=8540, which falls back to software float for doubles. With -mcpu=8548 the FPU is used also for doubles, but for some reason this lowers the single float performance.
The situation could possibly change when the division operator is used, but division should be avoided anyway when it comes to optimization.
All tests are done on Linux and with GCC: it would surprise me much if results where very different on other platforms.
More tests could be made on more modern hardware, but precision advantage of double over int is lost for 64-bit machines with native 64-bit long int support.
As a rule of thumb, integers are faster than floats, and replacing integers with floats does not improve performance. Use the datatype that describes your data the best!
Exploiting the 52-bit integer capacity of a double should be considered advanced and platform dependent optimization, and not a good idea in the general case.