Float or double?

computer man


In scientific computation we use floating point numbers a lot. This article is a guide to picking the right floating point representation for you. In most programming languages there are two built-in precisions to pick from: 32-bit (single-precision) and 64-bit (double-precision). In the C family of languages these are known as float and double, and those are the names I will use in this article. There are other precisions: half, quad etc. I won’t cover these here, but a lot of the discussion makes sense for half vs float or double vs quad too. So to be clear: I will only talk about 32-bit and 64-bit IEEE 754 here. Continue reading