Float or double?

computer man


In scientific computation we use floating point numbers a lot. This article is a guide to picking the right floating point representation for you. In most programming languages there are two built-in precisions to pick from: 32-bit (single-precision) and 64-bit (double-precision). In the C family of languages these are known as float and double, and those are the names I will use in this article. There are other precisions: half, quad etc. I won’t cover these here, but a lot of the discussion makes sense for half vs float or double vs quad too. So to be clear: I will only talk about 32-bit and 64-bit IEEE 754 here. Continue reading

Part 28. C++11 and 64-bit Issues

The time goes by and the programmers are already using the updated version of C++ language, which is called C++11. AT this point a lot of modernizations, described in the C++11 standard are supported by most of the compilers. Can these modernizations help a programmer to avoid 64-bit errors? You may find the answer by reading the article “C++11 and 64-bit IssuesContinue reading

Part 26. Peculiarities of creating installers for the 64-bit environment

When developing the 64-bit version of an application, you should also be very attentive to the issue of program distribution – you might encounter some peculiar problems when installing the program on a 64-bit operating system, and if you forget about them, you will get a non-working installation package. First of all you should understand that the program installer itself (the exe-file that launches the installation process) can technically be either a 32-bit application or a 64-bit one. Continue reading

Part 25. Optimization of 64-bit programs

Reducing amounts of memory being consumed

When a program is compiled in the 64-bit mode, it starts consuming more memory than its 32-bit version. This increase often stays unnoticed, but sometimes memory consumption may grow twice. The growth of memory consumption is determined by the following factors: Continue reading

Part 24. Phantom errors

We have finished studying the patterns of 64-bit errors and the last thing we will speak about, concerning these errors, is in what ways they may occur in programs.

The point is that it is not so easy to show you by an example, as in the following code sample, that the 64-bit code will cause an error when “N” takes large values: Continue reading

Part 22. Pattern 14. Overloaded functions

When porting a 32-bit program to a 64-bit platform, you may encounter changes in its logic related to the use of overloaded functions. If a function is overlapped for 32-bit and 64-bit values, the access to it with an argument of a memsize-type will be translated into different calls on different systems. This technique may be useful as, for example, in this code: Continue reading