Whenever someone asks how to do floating point math in a shell script, the answer is typically bc
:
$ echo "scale=9; 22/7" | bc 3.142857142
However, this is technically wrong: bc
does not support floating point at all! What you see above is arbitrary precision FIXED point arithmetic.
The user’s intention is obviously to do math with fractional numbers, regardless of the low level implementation, so the above is a good and pragmatic answer. However, technically correct is the best kind of correct, so let’s stop being helpful and start pedantically splitting hairs instead!
Fixed vs floating point
There are many important things that every programmer should know about floating point, but in one sentence, the larger they get, the less precise they are.
In fixed point you have a certain number of digits, and a decimal point fixed in place like on a tax form: 001234.56
. No matter how small or large the number, you can always write down increments of 0.01, whether it’s 000000.01 or 999999.99.
Floating point, meanwhile, is basically scientific notation. If you have 1.23e-4 (0.000123), you can increment by a millionth to get 1.24e-4. However, if you have 1.23e4 (12300), you can’t add less than 100 unless you reserve more space for more digits.
We can see this effect in practice in any language that supports floating point, such as Haskell:
> truncate (16777216 - 1 :: Float) 16777215 > truncate (16777216 + 1 :: Float) 16777216
Subtracting 1 gives us the decremented number, but adding 1 had no effect with floating point math! bc, with its arbitrary precision fixed points, would instead correctly give us 16777217! This is clearly unacceptable!
Floating point in bc
The problem with the bc
solution is, in other words, that the math is too correct. Floating point math always introduces and accumulates rounding errors in ways that are hard to predict. Fixed point doesn’t, and therefore we need to find a way to artificially introduce the same type of inaccuracies! We can do this by rounding a number to a N significant bits, where N = 24 for float
and 52 for double
. Here is some bc code for that:
scale=30 define trunc(x) { auto old, tmp old=scale; scale=0; tmp=x/1; scale=old return tmp } define fp(bits, x) { auto i if (x < 0) return -fp(bits, -x); if (x == 0) return 0; i=bits while (x < 1) { x*=2; i+=1; } while (x >= 2) { x/=2; i-=1; } return trunc(x * 2^bits + 0.5) / 2^(i) } define float(x) { return fp(24, x); } define double(x) { return fp(52, x); } define test(x) { print "Float: ", float(x), "\n" print "Double: ", double(x), "\n" }
With this file named fp
, we can try it out:
$ bc -ql fp <<< "22/7" 3.142857142857142857142857142857 $ bc -ql fp <<< "float(22/7)" 3.142857193946838378906250000000
The first number is correct to 30 decimals. Yuck! However, with our floating point simulator applied, we get the desired floating point style errors after ~7 decimals!
Let's write a similar program for doing the same thing but with actual floating point, printing them out up to 30 decimals as well:
{-# LANGUAGE RankNTypes #-} import Control.Monad import Data.Number.CReal import System.Environment main = do input <- liftM head getArgs putStrLn . ("Float: " ++) $ showNumber (read input :: Float) putStrLn . ("Double: " ++) $ showNumber (read input :: Double) where showNumber :: forall a. Real a => a -> String showNumber = showCReal 30 . realToFrac
Here's a comparison of the two:
$ bc -ql fp <<< "x=test(1000000001.3)" Float: 1000000000.000000000000000000000000000000 Double: 1000000001.299999952316284179687500000000 $ ./fptest 1000000001.3 Float: 1000000000.0 Double: 1000000001.2999999523162841796875
Due to differences in rounding and/or off-by-one bugs, they're not always identical like here, but the error bars are similar.
Now we can finally start doing floating point math in bc!