---
title: Introduction to Julia
subtitle: Biostat/Biomath M257
author: Dr. Hua Zhou @ UCLA
date: today
format:
html:
theme: cosmo
embed-resources: true
number-sections: true
toc: true
toc-depth: 4
toc-location: left
code-fold: false
jupyter:
jupytext:
formats: 'ipynb,qmd'
text_representation:
extension: .qmd
format_name: quarto
format_version: '1.0'
jupytext_version: 1.14.5
kernelspec:
display_name: Julia 1.8.5
language: julia
name: julia-1.8
---
System information (for reproducibility)
```{julia}
versioninfo()
```
Load packages:
```{julia}
using Pkg
Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()
```
```{julia}
using BenchmarkTools, Distributions, RCall
using LinearAlgebra, Profile, SparseArrays
```
## What's Julia?
> Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments
* History:
- Project started in 2009. First public release in 2012
- Creators: Jeff Bezanson, Alan Edelman, Stefan Karpinski, Viral Shah
- First major release v1.0 was released on Aug 8, 2018
- Current stable release v1.8.5 (as of Apr 6, 2023)
* Aim to solve the notorious **two language problem**: Prototype code goes into high-level languages like R/Python, production code goes into low-level language like C/C++.
Julia aims to:
> Walks like Python. Runs like C.
See for the details of benchmark.
* Write high-level, abstract code that closely resembles mathematical formulas
- yet produces fast, low-level machine code that has traditionally only been generated by static languages.
* Julia is more than just "Fast R" or "Fast Matlab"
- Performance comes from features that work well together.
- You can't just take the magic dust that makes Julia fast and sprinkle it on [language of choice]
## Learning resources
1. The (free) online course [Introduction to Julia](https://juliaacademy.com/p/intro-to-julia), by Jane Herriman.
2. Cheat sheet: [The Fast Track to Julia](https://juliadocs.github.io/Julia-Cheat-Sheet/).
3. Browse the Julia [documentation](https://docs.julialang.org/en).
bm
4. For Matlab users, read [Noteworthy Differences From Matlab](https://docs.julialang.org/en/v1/manual/noteworthy-differences/#Noteworthy-differences-from-MATLAB-1).
For R users, read [Noteworthy Differences From R](https://docs.julialang.org/en/v1/manual/noteworthy-differences/#Noteworthy-differences-from-R-1).
For Python users, read [Noteworthy Differences From Python](https://docs.julialang.org/en/v1/manual/noteworthy-differences/?highlight=matlab#Noteworthy-differences-from-Python-1).
5. The [Learning page](http://julialang.org/learning/) on Julia's website has pointers to many other learning resources.
## Julia REPL (Read-Evaluation-Print-Loop)
The `Julia` REPL, or `Julia` shell, has at least five modes.
1. **Default mode** is the Julian prompt `julia>`. Type backspace in other modes to enter default mode.
2. **Help mode** `help?>`. Type `?` to enter help mode. `?search_term` does a fuzzy search for `search_term`.
3. **Shell mode** `shell>`. Type `;` to enter shell mode.
4. **Package mode** `(@v1.8) pkg>`. Type `]` to enter package mode for managing Julia packages (install, uninstall, update, ...).
5. **Search mode** `(reverse-i-search)`. Press `ctrl+R` to enter search model.
6. With `RCall.jl` package installed, we can enter the **R mode** by typing `$` (shift+4) at Julia REPL.
Some survival commands in Julia REPL:
1. `exit()` or `Ctrl+D`: exit Julia.
2. `Ctrl+C`: interrupt execution.
3. `Ctrl+L`: clear screen.
0. Append `;` (semi-colon) to suppress displaying output from a command. Same as Matlab.
0. `include("filename.jl")` to source a Julia code file.
## Seek help
* Online help from REPL: `?function_name`.
* Google.
* Julia documentation: .
* Look up source code: `@edit sin(π)`.
* Discourse: .
* Friends.
## Which IDE?
* Julia homepage lists many choices: Juno, VS Code, Vim, ...
* Unfortunately at the moment there are no mature RStudio- or Matlab-like IDE for Julia yet.
* For dynamic document, e.g., homework, I recommend [Jupyter Notebook](https://jupyter.org/install.html) or [JupyterLab](http://jupyterlab.readthedocs.io/en/stable/).
* For extensive Julia coding, myself has been happily using the editor [VS Code](https://code.visualstudio.com) with extensions `Julia` and `VS Code Jupyter Notebook Previewer` installed.
## Julia package system
* Each Julia package is a Git repository. Each Julia package name ends with `.jl`. E.g., `Distributions.jl` package lives at .
Google search with `PackageName.jl` usually leads to the package on github.com.
* The package ecosystem is rapidly maturing; a complete list of **registered** packages (which are required to have a certain level of testing and documentation) is at [http://pkg.julialang.org/](http://pkg.julialang.org/).
* For example, the package called `Distributions.jl` is added with
```julia
# in Pkg mode
(@v1.8) pkg> add Distributions
```
and "removed" (although not completely deleted) with
```julia
# in Pkg mode
(@v1.8) pkg> rm Distributions
```
* The package manager provides a dependency solver that determines which packages are actually required to be installed.
* **Non-registered** packages are added by cloning the relevant Git repository. E.g.,
```julia
# in Pkg mode
(@v1.8) pkg> add https://github.com/OpenMendel/SnpArrays.jl
```
* A package needs only be added once, at which point it is downloaded into your local `.julia/packages` directory in your home directory.
```{julia}
readdir(Sys.islinux() ? ENV["JULIA_PATH"] * "/pkg/packages" : ENV["HOME"] * "/.julia/packages")
```
* Directory of a specific package can be queried by `pathof()`:
```{julia}
pathof(Distributions)
```
* If you start having problems with packages that seem to be unsolvable, you may try just deleting your .julia directory and reinstalling all your packages.
* Periodically, one should run `update` in Pkg mode, which checks for, downloads and installs updated versions of all the packages you currently have installed.
* `status` lists the status of all installed packages.
* Using functions in package.
```julia
using Distributions
```
This pulls all of the *exported* functions in the module into your local namespace, as you can check using the `whos()` command. An alternative is
```julia
import Distributions
```
Now, the functions from the Distributions package are available only using
```julia
Distributions.
```
All functions, not only exported functions, are always available like this.
## Calling R from Julia
* The [`RCall.jl`](https://github.com/JuliaInterop/RCall.jl) package allows us to embed R code inside of Julia.
* There are also `PyCall.jl`, `MATLAB.jl`, `JavaCall.jl`, `CxxWrap.jl` packages for interfacing with other languages.
```{julia}
# x is in Julia workspace
x = randn(1000)
# $ is the interpolation operator
R"""
hist($x, main = "I'm plotting a Julia vector")
""";
```
```{julia}
R"""
library(ggplot2)
qplot($x)
"""
```
```{julia}
x = R"""
rnorm(10)
"""
```
```{julia}
# collect R variable into Julia workspace
y = collect(x)
```
* Access Julia variables in R REPL mode:
```julia
julia> x = rand(5) # Julia variable
R> y <- $x
```
* Pass Julia expression in R REPL mode:
```julia
R> y <- $(rand(5))
```
* Put Julia variable into R environment:
```julia
julia> @rput x
R> x
```
* Get R variable into Julia environment:
```julia
R> r <- 2
Julia> @rget r
```
* If you want to call Julia within R, check out the [`JuliaCall`](https://non-contradiction.github.io/JuliaCall//index.html) package.
## Some basic Julia code
```{julia}
# an integer, same as int in R
y = 1
```
```{julia}
# query type of a Julia object
typeof(y)
```
```{julia}
# a Float64 number, same as double in R
y = 1.0
```
```{julia}
typeof(y)
```
```{julia}
# Greek letters: `\pi`
π
```
```{julia}
typeof(π)
```
```{julia}
# Greek letters: `\theta`
θ = y + π
```
```{julia}
# emoji! `\:kissing_cat:`
😽 = 5.0
😽 + 1
```
```{julia}
# `\alpha\hat`
α̂ = π
```
For a list of unicode symbols that can be tab-completed, see . Or in the help mode, type `?` followed by the unicode symbol you want to input.
```{julia}
# vector of Float64 0s
x = zeros(5)
```
```{julia}
# vector Int64 0s
x = zeros(Int, 5)
```
```{julia}
# matrix of Float64 0s
x = zeros(5, 3)
```
```{julia}
# matrix of Float64 1s
x = ones(5, 3)
```
```{julia}
# define array without initialization
x = Matrix{Float64}(undef, 5, 3)
```
```{julia}
# fill a matrix by 0s
fill!(x, 0)
```
```{julia}
# initialize an array to be constant 2.5
fill(2.5, (5, 3))
```
```{julia}
# rational number
a = 3//5
```
```{julia}
typeof(a)
```
```{julia}
b = 3//7
```
```{julia}
a + b
```
```{julia}
# uniform [0, 1) random numbers
x = rand(5, 3)
```
```{julia}
# uniform random numbers (in single precision)
x = rand(Float16, 5, 3)
```
```{julia}
# random numbers from {1,...,5}
x = rand(1:5, 5, 3)
```
```{julia}
# standard normal random numbers
x = randn(5, 3)
```
```{julia}
# range
1:10
```
```{julia}
typeof(1:10)
```
```{julia}
1:2:10
```
```{julia}
typeof(1:2:10)
```
```{julia}
# integers 1-10
x = collect(1:10)
```
```{julia}
# or equivalently
[1:10...]
```
```{julia}
# Float64 numbers 1-10
x = collect(1.0:10)
```
```{julia}
# convert to a specific type
convert(Vector{Float64}, 1:10)
```
## Matrices and vectors
### Dimensions
```{julia}
x = randn(5, 3)
```
```{julia}
size(x)
```
```{julia}
size(x, 1) # nrow() in R
```
```{julia}
size(x, 2) # ncol() in R
```
```{julia}
# total number of elements
length(x)
```
### Indexing
```{julia}
# 5 × 5 matrix of random Normal(0, 1)
x = randn(5, 5)
```
```{julia}
# first column
x[:, 1]
```
```{julia}
# first row
x[1, :]
```
```{julia}
# sub-array
x[1:2, 2:3]
```
```{julia}
# getting a subset of a matrix creates a copy, but you can also create "views"
z = view(x, 1:2, 2:3)
```
```{julia}
# same as
@views z = x[1:2, 2:3]
```
```{julia}
# change in z (view) changes x as well
z[2, 2] = 0.0
x
```
```{julia}
# y points to same data as x
y = x
```
```{julia}
# x and y point to same data
pointer(x), pointer(y)
```
```{julia}
# changing y also changes x
y[:, 1] .= 0
x
```
```{julia}
# create a new copy of data
z = copy(x)
```
```{julia}
pointer(x), pointer(z)
```
### Concatenate matrices
```{julia}
# 3-by-1 vector
[1, 2, 3]
```
```{julia}
# 1-by-3 array
[1 2 3]
```
```{julia}
# multiple assignment by tuple
x, y, z = randn(5, 3), randn(5, 2), randn(3, 5)
```
```{julia}
[x y] # 5-by-5 matrix
```
```{julia}
[x y; z] # 8-by-5 matrix
```
### Dot operation (broadcasting)
Dot operation in Julia is elementwise operation, similar to Matlab.
```{julia}
x = randn(5, 3)
```
```{julia}
y = ones(5, 3)
```
```{julia}
x .* y # same x * y in R
```
```{julia}
x .^ (-2)
```
```{julia}
sin.(x)
```
### Basic linear algebra
```{julia}
x = randn(5)
```
```{julia}
# vector L2 norm
norm(x)
```
```{julia}
# same as
sqrt(sum(abs2, x))
```
```{julia}
y = randn(5) # another vector
# dot product
dot(x, y) # x' * y
```
```{julia}
# same as
x'y
```
```{julia}
x, y = randn(5, 3), randn(3, 2)
# matrix multiplication, same as %*% in R
x * y
```
```{julia}
x = randn(3, 3)
```
```{julia}
# conjugate transpose
x'
```
```{julia}
b = rand(3)
x'b # same as x' * b
```
```{julia}
# trace
tr(x)
```
```{julia}
det(x)
```
```{julia}
rank(x)
```
### Sparse matrices
```{julia}
# 10-by-10 sparse matrix with sparsity 0.1
X = sprandn(10, 10, .1)
```
```{julia}
# dump() in Julia is like str() in R
dump(X)
```
```{julia}
# convert to dense matrix; be cautious when dealing with big data
Xfull = convert(Matrix{Float64}, X)
```
```{julia}
# convert a dense matrix to sparse matrix
sparse(Xfull)
```
```{julia}
# syntax for sparse linear algebra is same as dense linear algebra
β = ones(10)
X * β
```
```{julia}
# many functions apply to sparse matrices as well
sum(X)
```
## Control flow and loops
* if-elseif-else-end
```julia
if condition1
# do something
elseif condition2
# do something
else
# do something
end
```
* `for` loop
```julia
for i in 1:10
println(i)
end
```
* Nested `for` loop:
```julia
for i in 1:10
for j in 1:5
println(i * j)
end
end
```
Same as
```julia
for i in 1:10, j in 1:5
println(i * j)
end
```
* Exit loop:
```julia
for i in 1:10
# do something
if condition1
break # skip remaining loop
end
end
```
* Exit iteration:
```julia
for i in 1:10
# do something
if condition1
continue # skip to next iteration
end
# do something
end
```
## Functions
* In Julia, all arguments to functions are **passed by reference**, in contrast to R and Matlab.
* Function names ending with `!` indicates that function mutates at least one argument, typically the first.
```julia
sort!(x) # vs sort(x)
```
* Function definition
```julia
function func(req1, req2; key1=dflt1, key2=dflt2)
# do stuff
return out1, out2, out3
end
```
**Required arguments** are separated with a comma and use the positional notation.
**Optional arguments** need a default value in the signature.
**Semicolon** is not required in function call.
**return** statement is optional.
Multiple outputs can be returned as a **tuple**, e.g., `return out1, out2, out3`.
* Anonymous functions, e.g., `x -> x^2`, is commonly used in collection function or list comprehensions.
```julia
map(x -> x^2, y) # square each element in x
```
* Functions can be nested:
```julia
function outerfunction()
# do some outer stuff
function innerfunction()
# do inner stuff
# can access prior outer definitions
end
# do more outer stuff
end
```
* Functions can be vectorized using the Dot syntax:
```{julia}
# defined for scalar
function myfunc(x)
return sin(x^2)
end
x = randn(5, 3)
myfunc.(x)
```
* **Collection function** (think this as the series of `apply` functions in R).
Apply a function to each element of a collection:
```julia
map(f, coll) # or
map(coll) do elem
# do stuff with elem
# must contain return
end
```
```{julia}
map(x -> sin(x^2), x)
```
```{julia}
map(x) do elem
elem = elem^2
return sin(elem)
end
```
```{julia}
# Mapreduce
mapreduce(x -> sin(x^2), +, x)
```
```{julia}
# same as
sum(x -> sin(x^2), x)
```
* List **comprehension**
```{julia}
[sin(2i + j) for i in 1:5, j in 1:3] # similar to Python
```
## Type system
* Every variable in Julia has a type.
* When thinking about types, think about sets.
* Everything is a subtype of the abstract type `Any`.
* An abstract type defines a set of types
- Consider types in Julia that are a `Number`:
* We can explore type hierarchy with `typeof()`, `supertype()`, and `subtypes()`.
```{julia}
# 1.0: double precision, 1: 64-bit integer
typeof(1.0), typeof(1)
```
```{julia}
supertype(Float64)
```
```{julia}
subtypes(AbstractFloat)
```
```{julia}
# Is Float64 a subtype of AbstractFloat?
Float64 <: AbstractFloat
```
```{julia}
# On 64bit machine, Int == Int64
Int == Int64
```
```{julia}
# convert to Float64
convert(Float64, 1)
```
```{julia}
# same as casting
Float64(1)
```
```{julia}
# Float32 vector
x = randn(Float32, 5)
```
```{julia}
# convert to Float64
convert(Vector{Float64}, x)
```
```{julia}
# same as broadcasting (dot operatation)
Float64.(x)
```
```{julia}
# convert Float64 to Int64
convert(Int, 1.0)
```
```{julia}
convert(Int, 1.5) # should use round(1.5)
```
```{julia}
round(Int, 1.5)
```
## Multiple dispatch
* Multiple dispatch lies in the core of Julia design. It allows built-in and user-defined functions to be overloaded for different combinations of argument types.
* Let's consider a simple "doubling" function:
```{julia}
g(x) = x + x
```
```{julia}
g(1.5)
```
This definition is too broad, since some things, e.g., strings, can't be added
```{julia}
g("hello world")
```
* This definition is correct but too restrictive, since any `Number` can be added.
```{julia}
g(x::Float64) = x + x
```
* This definition will automatically work on the entire type tree above!
```{julia}
g(x::Number) = x + x
```
This is a lot nicer than
```julia
function g(x)
if isa(x, Number)
return x + x
else
throw(ArgumentError("x should be a number"))
end
end
```
* `methods(func)` function display all methods defined for `func`.
```{julia}
methods(g)
```
* When calling a function with multiple definitions, Julia will search from the narrowest signature to the broadest signature.
* `@which func(x)` marco tells which method is being used for argument signature `x`.
```{julia}
# an Int64 input
@which g(1)
```
```{julia}
# a Vector{Float64} input
@which g(randn(5))
```
## Just-in-time compilation (JIT)
Following figures are taken from Arch D. Robinson's slides [Introduction to Writing High Performance Julia](https://www.youtube.com/watch?v=szE4txAD8mk).
| | |
|----------------------------------|------------------------------------|
|||
* `Julia`'s efficiency results from its capability to infer the types of **all** variables within a function and then call LLVM compiler to generate optimized machine code at run-time.
Consider the `g` (doubling) function defined earlier. This function will work on **any** type which has a method for `+`.
```{julia}
g(2), g(2.0)
```
**Step 1**: Parse Julia code into [abstract syntax tree (AST)](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
```{julia}
@code_lowered g(2)
```
**Step 2**: Type inference according to input type.
```{julia}
# type inference for integer input
@code_warntype g(2)
```
```{julia}
# type inference for Float64 input
@code_warntype g(2.0)
```
**Step 3**: Compile into **LLVM bitcode** (equivalent of R bytecode generated by the `compiler` package).
```{julia}
# LLVM bitcode for integer input
@code_llvm g(2)
```
```{julia}
# LLVM bitcode for Float64 input
@code_llvm g(2.0)
```
We didn't provide a type annotation. But different LLVM bitcodes were generated depending on the argument type!
In R or Python, `g(2)` and `g(2.0)` would use the same code for both.
In Julia, `g(2)` and `g(2.0)` dispatches to optimized code for `Int64` and `Float64`, respectively.
For integer input `x`, LLVM compiler is smart enough to know `x + x` is simply shifting `x` by 1 bit, which is faster than addition.
* **Step 4**: Lowest level is the **assembly code**, which is machine dependent.
```{julia}
# Assembly code for integer input
@code_native g(2)
```
```{julia}
# Assembly code for Float64 input
@code_native g(2.0)
```
## Profiling Julia code
Julia has several built-in tools for profiling. The `@time` marco outputs run time and heap allocation. Note the first call of a function incurs (substantial) compilation time.
```{julia}
# a function defined earlier
function tally(x::Array)
s = zero(eltype(x))
for v in x
s += v
end
s
end
using Random
Random.seed!(257)
a = rand(20_000_000)
@time tally(a) # first run: include compile time
```
```{julia}
@time tally(a)
```
For more robust benchmarking, the [BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl) package is highly recommended.
```{julia}
@benchmark tally($a)
```
The `Profile` module gives line by line profile results.
```{julia}
Profile.clear()
@profile tally(a)
Profile.print(format=:flat)
```
One can use [`ProfileView`](https://github.com/timholy/ProfileView.jl) package for better visualization of profile data:
```julia
using ProfileView
ProfileView.view()
```
```{julia}
# check type stability
@code_warntype tally(a)
```
```{julia}
# check LLVM bitcode
@code_llvm tally(a)
```
```{julia}
@code_native tally(a)
```
**Exercise:** Annotate the loop in `tally` function by `@simd` and look for the difference in LLVM bitcode and machine code.
## Memory profiling
Detailed memory profiling requires a detour. First let's write a script `bar.jl`, which contains the workload function `tally` and a wrapper for profiling.
```{julia}
;cat bar.jl
```
Next, in terminal, we run the script with `--track-allocation=user` option.
```{julia}
#;julia --track-allocation=user bar.jl
```
The profiler outputs a file `bar.jl.51116.mem`.
```{julia}
;cat bar.jl.51116.mem
```