{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Computer arithmetics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Units of computer storage\n",
"\n",
"* `bit` = `binary` + `digit` (coined by statistician [John Tukey](https://en.wikipedia.org/wiki/Bit#History)). \n",
"* `byte` = 8 bits. \n",
"* KB = kilobyte = $10^3$ bytes. \n",
"* MB = megabytes = $10^6$ bytes. \n",
"* GB = gigabytes = $10^9$ bytes. Typical RAM size. \n",
"* TB = terabytes = $10^{12}$ bytes. Typical hard drive size. Size of NYSE each trading session. \n",
"* PB = petabytes = $10^{15}$ bytes. \n",
"* EB = exabytes = $10^{18}$ bytes. Size of all healthcare data in 2011 is ~150 EB. \n",
"* ZB = zetabytes = $10^{21}$ bytes. \n",
"\n",
"Julia function `Base.summarysize` shows the amount of memory (in bytes) used by an object."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"80000"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = rand(100, 100)\n",
"Base.summarysize(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`whos()` function prints all variables in workspace and their sizes."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Base Module\n",
" Compat 19628 KB Module\n",
" Core Module\n",
" IJulia 19702 KB Module\n",
" JSON 19539 KB Module\n",
" Main Module\n",
" MbedTLS 19564 KB Module\n",
" Nullables 1120 bytes Module\n",
" ZMQ 19510 KB Module\n",
" x 78 KB 100×100 Array{Float64,2}\n"
]
}
],
"source": [
"whos()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Storage of Characters\n",
"\n",
"* Plain text files are stored in the form of characters: `.jl`, `.r`, `.c`, `.cpp`, `.ipynb`, `.html`, `.tex`, ... \n",
"* ASCII (American Code for Information Interchange): 7 bits, only $2^7=128$ characters. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"128×2 Array{Any,2}:\n",
" 0 '\\0' \n",
" 1 '\\x01'\n",
" 2 '\\x02'\n",
" 3 '\\x03'\n",
" 4 '\\x04'\n",
" 5 '\\x05'\n",
" 6 '\\x06'\n",
" 7 '\\a' \n",
" 8 '\\b' \n",
" 9 '\\t' \n",
" 10 '\\n' \n",
" 11 '\\v' \n",
" 12 '\\f' \n",
" ⋮ \n",
" 116 't' \n",
" 117 'u' \n",
" 118 'v' \n",
" 119 'w' \n",
" 120 'x' \n",
" 121 'y' \n",
" 122 'z' \n",
" 123 '{' \n",
" 124 '|' \n",
" 125 '}' \n",
" 126 '~' \n",
" 127 '\\x7f'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# integers 0, 1, ..., 127 and corresponding ascii character\n",
"# show(STDOUT, \"text/plain\", [0:127 Char.(0:127)]) \n",
"[0:127 Char.(0:127)] "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Extended ASCII: 8 bits, $2^8=256$ characters. "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"128×2 Array{Any,2}:\n",
" 128 '\\u80'\n",
" 129 '\\u81'\n",
" 130 '\\u82'\n",
" 131 '\\u83'\n",
" 132 '\\u84'\n",
" 133 '\\u85'\n",
" 134 '\\u86'\n",
" 135 '\\u87'\n",
" 136 '\\u88'\n",
" 137 '\\u89'\n",
" 138 '\\u8a'\n",
" 139 '\\u8b'\n",
" 140 '\\u8c'\n",
" ⋮ \n",
" 244 'ô' \n",
" 245 'õ' \n",
" 246 'ö' \n",
" 247 '÷' \n",
" 248 'ø' \n",
" 249 'ù' \n",
" 250 'ú' \n",
" 251 'û' \n",
" 252 'ü' \n",
" 253 'ý' \n",
" 254 'þ' \n",
" 255 'ÿ' "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# integers 128, 129, ..., 255 and corresponding extended ascii character\n",
"# show(STDOUT, \"text/plain\", [128:255 Char.(128:255)])\n",
"[128:255 Char.(128:255)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Unicode: UTF-8, UTF-16 and UTF-32 support many more characters including foreign characters; last 7 digits conform to ASCII. \n",
"\n",
"* [UTF-8](https://en.wikipedia.org/wiki/UTF-8) is the current dominant character encoding on internet. \n",
"\n",
"\n",
"\n",
"* Julia supports the full range of UTF-8 characters. You can type many Unicode math symbols by typing the backslashed LaTeX symbol name followed by tab. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.0"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# \\beta-\n",
"β = 0.0\n",
"# \\beta--\\hat-\n",
"β̂ = 0.0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fixed-point number system\n",
"\n",
"* Fixed-point number system is a computer model for integers $\\mathbb{Z}$. \n",
"\n",
"* The number of bits and method of representing negative numbers vary from system to system. \n",
" - The `integer` type in R has $M=32$ or 64 bits. \n",
" - Matlab has `(u)int8`, `(u)int16`, `(u)int32`, `(u)int64`. \n",
" - Julia has even more integer types. Using Tom Breloff's `Plots.jl` and `PlotRecipes.jl` packages, we can [visualize the type tree](http://www.breloff.com/Graphs/) under `Integer`\n",
" - Storage for a `Signed` or `Unsigned` integer can be $M = 8, 16, 32, 64$ or 128 bits."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[1m\u001b[33mWARNING: \u001b[39m\u001b[22m\u001b[33mInstall LightGraphs for the best layout calculations.\u001b[39m\n"
]
},
{
"data": {
"image/svg+xml": [
"\n",
"\n"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# make a list of a type T and it's supertypes\n",
"T = Integer\n",
"sups = [T]\n",
"sup = T\n",
"while sup != Any\n",
" sup = supertype(sup)\n",
" unshift!(sups, sup)\n",
"end\n",
"\n",
"# recursively build a graph of subtypes of T\n",
"n = length(sups)\n",
"nodes, source, destiny = copy(sups), collect(1:n-1), collect(2:n)\n",
"function add_subs!(T, supidx)\n",
" for sub in subtypes(T)\n",
" push!(nodes, sub)\n",
" subidx = length(nodes)\n",
" push!(source, supidx)\n",
" push!(destiny, subidx)\n",
" add_subs!(sub, subidx)\n",
" end\n",
"end\n",
"add_subs!(T, n)\n",
"names = map(string, nodes)\n",
"\n",
"using PlotRecipes\n",
"#pyplot(alpha=0.5, size=(800, 500))\n",
"gr(alpha=0.5, size=(800, 500))\n",
"graphplot(source, destiny, names=names, method=:tree)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Signed integers\n",
"\n",
"* First bit indicates sign. \n",
" - `0` for nonnegative numbers\n",
" - `1` for negative numbers \n",
" \n",
"* **Two's complement representation** for negative numbers. \n",
" - Sign bit is set to 1 \n",
" - remaining bits are set to opposite values \n",
" - 1 is added to the result \n",
" - Two's complement representation of a negative integer `x` is same as the unsigned integer `2^m + x`."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"typeof(18) = Int64\n",
"bits(18) = \"0000000000000000000000000000000000000000000000000000000000010010\"\n",
"bits(-18) = \"1111111111111111111111111111111111111111111111111111111111101110\"\n",
"bits(UInt64(Int128(2) ^ 64 - 18)) == bits(-18) = true\n",
"bits(2 * 18) = \"0000000000000000000000000000000000000000000000000000000000100100\"\n",
"bits(2 * -18) = \"1111111111111111111111111111111111111111111111111111111111011100\"\n"
]
}
],
"source": [
"@show typeof(18)\n",
"@show bits(18)\n",
"@show bits(-18)\n",
"@show bits(UInt64(Int128(2)^64 - 18)) == bits(-18)\n",
"@show bits(2 * 18) # shift bits of 18\n",
"@show bits(2 * -18); # shift bits of -18"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Two's complement representation respects modular arithmetic nicely. \n",
" Addition of any two signed integers are just bitwise addition, possibly modulo $2^M$\n",
" \n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* **Range** of representable integers by $M$-bit **signed integer** is $[-2^{M-1},2^{M-1}-1]$.\n",
" - Julia functions `typemin(T)` and `typemax(T)` give the lowest and highest representable number of a type `T` respectively"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Int8\t-128\t127\n",
"Int16\t-32768\t32767\n",
"Int32\t-2147483648\t2147483647\n",
"Int64\t-9223372036854775808\t9223372036854775807\n",
"Int128\t-170141183460469231731687303715884105728\t170141183460469231731687303715884105727\n"
]
}
],
"source": [
"for T in [Int8, Int16, Int32, Int64, Int128]\n",
" println(T, '\\t', typemin(T), '\\t', typemax(T))\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **Unsigned integers**\n",
"\n",
"* For unsigned integers, the range is $[0,2^M-1]$."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"UInt8\t0\t255\n",
"UInt16\t0\t65535\n",
"UInt32\t0\t4294967295\n",
"UInt64\t0\t18446744073709551615\n",
"UInt128\t0\t340282366920938463463374607431768211455\n"
]
}
],
"source": [
"for t in [UInt8, UInt16, UInt32, UInt64, UInt128]\n",
" println(t, '\\t', typemin(t), '\\t', typemax(t))\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `BigInt`\n",
"\n",
"Julia `BigInt` type is arbitrary precision."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"typemax(Int128) = 170141183460469231731687303715884105727\n",
"typemax(Int128) + 1 = -170141183460469231731687303715884105728\n",
"BigInt(typemax(Int128)) + 1 = 170141183460469231731687303715884105728\n"
]
}
],
"source": [
"@show typemax(Int128)\n",
"@show typemax(Int128) + 1 # modular arithmetic!\n",
"@show BigInt(typemax(Int128)) + 1;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overflow and underflow for integer arithmetic\n",
"\n",
"R reports `NA` for integer overflow and underflow. \n",
"**Julia outputs the result according to modular arithmetic.**"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"typemax(Int32) = 2147483647\n",
"typemax(Int32) + Int32(1) = -2147483648\n"
]
}
],
"source": [
"@show typemax(Int32)\n",
"@show typemax(Int32) + Int32(1); # modular arithmetics!"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RCall.RObject{RCall.IntSxp}\n",
"[1] 2147483647\n"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"using RCall\n",
"\n",
"R\"\"\"\n",
".Machine$integer.max\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RCall.RObject{RCall.IntSxp}\n",
"[1] 2147483647\n"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"R\"\"\"\n",
"M <- 32\n",
"big <- 2^(M-1) - 1\n",
"as.integer(big)\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[1m\u001b[33mWARNING: \u001b[39m\u001b[22m\u001b[33mRCall.jl: Warning: NAs introduced by coercion to integer range\u001b[39m\n"
]
},
{
"data": {
"text/plain": [
"RCall.RObject{RCall.IntSxp}\n",
"[1] NA\n"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"R\"\"\"\n",
"as.integer(big+1)\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Floating-number system\n",
"\n",
"Floating-point number system is a computer model for real numbers.\n",
"\n",
"* Most computer systems adopt the [IEEE 754 standard](https://en.wikipedia.org/wiki/IEEE_floating_point), established in 1985, for floating-point arithmetics. \n",
"For the history, see an [interview with William Kahan](http://www.cs.berkeley.edu/~wkahan/ieee754status/754story.html).\n",
"\n",
"* In the scientific notation, a real number is represented as\n",
"$$\\pm d_0.d_1d_2 \\cdots d_p \\times b^e.$$\n",
"In computer, the _base_ is $b=2$ and the digits $d_i$ are 0 or 1.\n",
"\n",
"* **Normalized** vs **denormalized** numbers. For example, decimal number 18 is\n",
"$$ +1.0010 \\times 2^4 \\quad (\\text{normalized})$$\n",
"or, equivalently,\n",
"$$ +0.1001 \\times 2^5 \\quad (\\text{denormalized}).$$\n",
"\n",
"* In the floating-number system, computer stores \n",
" - sign bit \n",
" - the _fraction_ (or _mantissa_, or _significand_) of the **normalized** representation\n",
" - the actual exponent _plus_ a bias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* **Double precision** (64 bit = 8 bytes) \n",
" \n",
" - In Julia, `Float64` is the type for double precision numbers \n",
" - First bit is sign bit \n",
" - $p=52$ significant bits \n",
" - 11 exponent bits: $e_{\\max}=1023$, $e_{\\min}=-1022$, bias=1023 \n",
" - $e_{\\text{min}}-1$ and $e_{\\text{max}}+1$ are reserved for special numbers \n",
" - range of **magnitude**: $10^{\\pm 308}$ in decimal because $\\log_{10} (2^{1023}) \\approx 308$ \n",
" - **precision** to the $\\log_{10}(2^{-52}) \\approx 16$ decimal point \n",
" \n",
"\n",
"* **Single precision** (32 bit = 4 bytes) \n",
" \n",
" - In Julia, `Float32` is the type for single precision numbers\n",
" - First bit is sign bit \n",
" - $p=23$ significant bits \n",
" - 8 exponent bits: $e_{\\max}=127$, $e_{\\min}=-126$, bias=127 \n",
" - $e_{\\text{min}}-1$ and $e_{\\text{max}}+1$ are reserved for special numbers \n",
" - range of **magnitude**: $10^{\\pm 38}$ in decimal because $\\log_{10} (2^{127}) \\approx 38$ \n",
" - **precision**: $\\log_{10}(2^{23}) \\approx 7$ decimal point \n",
" \n",
"* **Half precision** (16 bit = 2 bytes)\n",
" \n",
" - In Julia, `Float16` is the type for half precision numbers\n",
" - First bit is sign bit \n",
" - $p=10$ significant bits \n",
" - 5 exponent bits: $e_{\\max}=15$, $e_{\\min}=-14$, bias=15 \n",
" - $e_{\\text{min}}-1$ and $e_{\\text{max}}+1$ are reserved for special numbers \n",
" - range of **magnitude**: $10^{\\pm 4}$ in decimal because $\\log_{10} (2^{15}) \\approx 4$ \n",
" - **precision**: $\\log_{10}(2^{10}) \\approx 3$ decimal point "
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Half precision:\n",
"bits(Float16(18)) = \"0100110010000000\"\n",
"bits(Float16(-18)) = \"1100110010000000\"\n",
"Single precision:\n",
"bits(Float32(18)) = \"01000001100100000000000000000000\"\n",
"bits(Float32(-18)) = \"11000001100100000000000000000000\"\n",
"Double precision:\n",
"bits(Float64(18)) = \"0100000000110010000000000000000000000000000000000000000000000000\"\n",
"bits(Float64(-18)) = \"1100000000110010000000000000000000000000000000000000000000000000\"\n",
"Float32(π) = 3.1415927f0\n",
"Float64(π) = 3.141592653589793\n"
]
},
{
"data": {
"text/plain": [
"3.141592653589793"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"println(\"Half precision:\")\n",
"@show bits(Float16(18)) # 18 in half precision\n",
"@show bits(Float16(-18)) # -18 in half precision\n",
"println(\"Single precision:\")\n",
"@show bits(Float32(18)) # 18 in single precision\n",
"@show bits(Float32(-18)) # -18 in single precision\n",
"println(\"Double precision:\")\n",
"@show bits(Float64(18)) # 18 in double precision\n",
"@show bits(Float64(-18)) # -18 in double precision\n",
"@show Float32(π) # SP number displays 7 decimal digits\n",
"@show Float64(π) # DP number displays 15 decimal digits"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Special floating-point numbers. \n",
" - Exponent $e_{\\max}+1$ plus a zero mantissa means $\\pm \\infty$ \n",
" - Exponent $e_{\\max}+1$ plus a nonzero mantissa means `NaN`. `NaN` could be produced from `0 / 0`, `0 * Inf`, ... \n",
" In general `NaN ≠ NaN` bitwise \n",
" - Exponent $e_{\\min}-1$ with a zero mantissa represents the real number 0 \n",
" - Exponent $e_{\\min}-1$ with a nonzero mantissa are for numbers less than $b^{e_{\\min}}$ \n",
" Numbers are _denormalized_ in the range $(0,b^{e_{\\min}})$ -- **graceful underflow** "
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"bits(Inf) = \"0111111111110000000000000000000000000000000000000000000000000000\"\n",
"bits(-Inf) = \"1111111111110000000000000000000000000000000000000000000000000000\"\n",
"bits(0 / 0) = \"1111111111111000000000000000000000000000000000000000000000000000\"\n",
"bits(0Inf) = \"1111111111111000000000000000000000000000000000000000000000000000\"\n",
"bits(0.0) = \"0000000000000000000000000000000000000000000000000000000000000000\"\n",
"nextfloat(0.0) = 5.0e-324\n",
"bits(nextfloat(0.0)) = \"0000000000000000000000000000000000000000000000000000000000000001\"\n"
]
}
],
"source": [
"@show bits(Inf) # Inf in double precision\n",
"@show bits(-Inf) # -Inf in double precision\n",
"@show bits(0 / 0) # NaN\n",
"@show bits(0 * Inf) # NaN\n",
"@show bits(0.0) # 0 in double precision \n",
"@show nextfloat(0.0) # next representable number \n",
"@show bits(nextfloat(0.0)); # denormalized"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* **Rounding** is necessary whenever a number has more than $p$ significand bits. Most computer systems use the default IEEE 754 _round to nearest_ mode (also called _ties to even_ mode). Julia offers several [rounding modes](https://docs.julialang.org/en/stable/stdlib/math/#Base.Rounding.RoundingMode), the default being [`RoundNearest`](https://docs.julialang.org/en/stable/stdlib/math/#Base.Rounding.RoundNearest). For example, the number 0.1 in decimal system cannot be represented accurately as a floating point number:\n",
"$$ 0.1 = 1.10011001... \\times 2^{-4} $$"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"bits(0.1f0) = \"00111101110011001100110011001101\"\n",
"bits(0.1) = \"0011111110111001100110011001100110011001100110011001100110011010\"\n"
]
}
],
"source": [
"@show bits(0.1f0) # single precision Float32\n",
"@show bits(0.1); # double precision Float64"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* In summary\n",
" - Single precision: range $\\pm 10^{\\pm 38}$ with precision up to 7 decimal digits\n",
" - Double precision: range $\\pm 10^{\\pm 308}$ with precision up to 16 decimal digits \n",
" - The floating-point numbers do not occur uniformly over the real number line\n",
" \n",
" Each magnitude has same number of representible numbers\n",
" - **Machine epsilons** are the spacings of numbers around 1: \n",
" $$\\epsilon_{\\min}=b^{-p}, \\quad \\epsilon_{\\max} = b^{1-p}.$$\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"eps(Float32) = 1.1920929f-7\n",
"eps(Float64) = 2.220446049250313e-16\n",
"eps(100.0) = 1.4210854715202004e-14\n",
"eps(0.0) = 5.0e-324\n",
"x = 1.25f0 = 1.25f0\n",
"(prevfloat(x), x, nextfloat(x)) = (1.2499999f0, 1.25f0, 1.2500001f0)\n",
"(bits(prevfloat(x)), bits(x), bits(nextfloat(x))) = (\"00111111100111111111111111111111\", \"00111111101000000000000000000000\", \"00111111101000000000000000000001\")\n"
]
}
],
"source": [
"@show eps(Float32) # machine epsilon for a floating point type\n",
"@show eps(Float64) # same as eps()\n",
"# eps(x) is the spacing after x\n",
"@show eps(100.0)\n",
"@show eps(0.0)\n",
"# nextfloat(x) and prevfloat(x) give the neighbors of x\n",
"@show x = 1.25f0\n",
"@show prevfloat(x), x, nextfloat(x)\n",
"@show bits(prevfloat(x)), bits(x), bits(nextfloat(x));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* In R, the variable `.Machine` contains numerical characteristics of the machine."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RCall.RObject{RCall.VecSxp}\n",
"$double.eps\n",
"[1] 2.220446e-16\n",
"\n",
"$double.neg.eps\n",
"[1] 1.110223e-16\n",
"\n",
"$double.xmin\n",
"[1] 2.225074e-308\n",
"\n",
"$double.xmax\n",
"[1] 1.797693e+308\n",
"\n",
"$double.base\n",
"[1] 2\n",
"\n",
"$double.digits\n",
"[1] 53\n",
"\n",
"$double.rounding\n",
"[1] 5\n",
"\n",
"$double.guard\n",
"[1] 0\n",
"\n",
"$double.ulp.digits\n",
"[1] -52\n",
"\n",
"$double.neg.ulp.digits\n",
"[1] -53\n",
"\n",
"$double.exponent\n",
"[1] 11\n",
"\n",
"$double.min.exp\n",
"[1] -1022\n",
"\n",
"$double.max.exp\n",
"[1] 1024\n",
"\n",
"$integer.max\n",
"[1] 2147483647\n",
"\n",
"$sizeof.long\n",
"[1] 8\n",
"\n",
"$sizeof.longlong\n",
"[1] 8\n",
"\n",
"$sizeof.longdouble\n",
"[1] 16\n",
"\n",
"$sizeof.pointer\n",
"[1] 8\n",
"\n"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"R\"\"\"\n",
".Machine\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Julia provides `Float16` (half precision), `Float32` (single precision), `Float64` (double precision), and `BigFloat` (arbitrary precision)."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[1m\u001b[33mWARNING: \u001b[39m\u001b[22m\u001b[33mInstall LightGraphs for the best layout calculations.\u001b[39m\n"
]
},
{
"data": {
"image/svg+xml": [
"\n",
"\n"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# make a list of a type T and it's supertypes\n",
"T = AbstractFloat\n",
"sups = [T]\n",
"sup = T\n",
"while sup != Any\n",
" sup = supertype(sup)\n",
" unshift!(sups,sup)\n",
"end\n",
"n = length(sups)\n",
"nodes, source, destiny = copy(sups), collect(1:n-1), collect(2:n)\n",
"add_subs!(T, n)\n",
"names = map(string, nodes)\n",
"\n",
"graphplot(source, destiny, names=names, method=:tree)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overflow and underflow of floating-point number\n",
"\n",
"* For double precision, the range is $\\pm 10^{\\pm 308}$. In most situations, underflow is preferred over overflow. Overflow causes crashes. Underflow yields zeros or denormalized numbers. \n",
"\n",
"* E.g., the logit link function is\n",
"$$p = \\frac{\\exp (x^T \\beta)}{1 + \\exp (x^T \\beta)} = \\frac{1}{1+\\exp(- x^T \\beta)}.$$\n",
"The former expression can easily lead to `Inf / Inf = NaN`, while the latter expression leads to graceful underflow."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Float16\t-Inf\tInf\t6.104e-5\t6.55e4\t0.000977\n",
"Float32\t-Inf\tInf\t1.1754944e-38\t3.4028235e38\t1.1920929e-7\n",
"Float64\t-Inf\tInf\t2.2250738585072014e-308\t1.7976931348623157e308\t2.220446049250313e-16\n"
]
}
],
"source": [
"for T in [Float16, Float32, Float64,]\n",
" println(T, '\\t', typemin(T), '\\t', typemax(T), '\\t', realmin(T), \n",
" '\\t', realmax(T), '\\t', eps(T))\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* `BigFloat` in Julia offers arbitrary precision."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(precision(BigFloat), realmin(BigFloat), realmax(BigFloat)) = (256, 8.50969131174083613912978790962048280567755996982969624908264897850135431080301e-1388255822130839284, 5.875653789111587590936911998878442589938516392745498308333779606469323584389875e+1388255822130839282)\n"
]
}
],
"source": [
"@show precision(BigFloat), realmin(BigFloat), realmax(BigFloat);"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"BigFloat(π) = 3.141592653589793238462643383279502884197169399375105820974944592307816406286198\n",
"BigFloat(π) = 3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067982148086513282306647093844609550582231725359408128481117450284102701938521105559644622948954930381964428810975665933446128475648233786783165271201909145648566923460348610454326648213393607260249141273724586997\n"
]
}
],
"source": [
"@show BigFloat(π); # default precision for BigFloat is 256 bits\n",
"# set precision to 1024 bits\n",
"setprecision(BigFloat, 1024) do \n",
" @show BigFloat(π)\n",
"end;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Catastrophic cancellation\n",
"\n",
"* **Scenario 1**: Addition or subtraction of two numbers of widely different magnitudes: $a+b$ or $a-b$ where $a \\gg b$ or $a \\ll b$. We loose the precision in the number of smaller magnitude. Consider \n",
"$$\\begin{eqnarray*}\n",
" a &=& x.xxx ... \\times 2^{30} \\\\ \n",
" b &=& y.yyy... \\times 2^{-30}\n",
"\\end{eqnarray*}$$\n",
"What happens when computer calculates $a+b$? We get $a+b=a$!"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"true"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = 1.0 * 2.0^30\n",
"b = 1.0 * 2.0^-30\n",
"a + b == a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* **Scenario 2**: Subtraction of two nearly equal numbers eliminates significant digits. $a-b$ where $a \\approx b$. Consider \n",
"$$\\begin{eqnarray*}\n",
" a &=& x.xxxxxxxxxx1ssss \\\\\n",
" b &=& x.xxxxxxxxxx0tttt\n",
"\\end{eqnarray*}$$\n",
"The result is $1.vvvvu...u$ where $u$ are unassigned digits."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"bits(a) = \"00111111100111100000011001010001\"\n",
"bits(b) = \"00111111100111100000011001010000\"\n",
"a - b = 1.1920929f-7\n",
"bits(a - b) = \"00110100000000000000000000000000\"\n"
]
}
],
"source": [
"a = 1.2345678f0 # rounding\n",
"@show bits(a) # rounding\n",
"b = 1.2345677f0\n",
"@show bits(b)\n",
"@show a - b # correct result should be 1e-7\n",
"@show bits(a - b);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Implications for numerical computation\n",
" - Rule 1: add small numbers together before adding larger ones \n",
" - Rule 2: add numbers of like magnitude together (paring). When all numbers are of same sign and similar magnitude, add in pairs so each stage the summands are of similar magnitude \n",
" - Rule 3: avoid substraction of two numbers that are nearly equal"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Algebraic laws\n",
"\n",
"Floating-point numbers may violate many algebraic laws we are familiar with, such associative and distributive laws. See Homework 1 problems."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Further reading\n",
"\n",
"* Textbook treatment, e.g., Chapter II.2 of [Computational Statistics](http://ucla.worldcat.org/title/computational-statistics/oclc/437345409&referer=brief_results) by James Gentle (2010).\n",
"\n",
"* [What every computer scientist should know about floating-point arithmetic](http://hua-zhou.github.io/teaching/biostatm280-2017spring/readings/Goldberg91FloatingPoint.pdf) by David Goldberg (1991)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 0.6.2",
"language": "julia",
"name": "julia-0.6"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "0.6.2"
},
"toc": {
"colors": {
"hover_highlight": "#DAA520",
"running_highlight": "#FF0000",
"selected_highlight": "#FFD700"
},
"moveMenuLeft": true,
"nav_menu": {
"height": "250px",
"width": "252px"
},
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"threshold": 4,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": false,
"widenNotebook": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}