{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Julia\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Types of computer languages\n", "\n", "* **Compiled languages**: C/C++, Fortran, ... \n", " - Directly compiled to machine code that is executed by CPU \n", " - Pros: fast, memory efficient\n", " - Cons: longer development time, hard to debug\n", "\n", "* **Interpreted language**: R, Matlab, Python, SAS IML, JavaScript, ... \n", " - Interpreted by interpreter\n", " - Pros: fast prototyping\n", " - Cons: excruciatingly slow for loops\n", "\n", "* Mixed (dynamic) languages: Matlab (JIT), R (`compiler` package), Julia, Cython, JAVA, ...\n", " - Pros and cons: between the compiled and interpreted languages\n", "\n", "* Script languages: Linux shell scripts, Perl, ...\n", " - Extremely useful for some data preprocessing and manipulation\n", "\n", "* Database languages: SQL, Hive (Hadoop). \n", " - Data analysis *never* happens if we do not know how to retrieve data from databases " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Messages\n", "\n", "* To be versatile in the big data era, master at least one language in each category.\n", "\n", "* To improve efficiency of interpreted languages such as R or Matlab, conventional wisdom is to avoid loops as much as possible. Aka, **vectorize code**\n", "> The only loop you are allowed to have is that for an iterative algorithm.\n", "\n", "* When looping is unavoidable, need to code in C, C++, or Fortran. \n", "Success stories: the popular `glmnet` package in R is coded in Fortran; `tidyverse` packages use a lot RCpp/C++.\n", "\n", "* Modern languages such as Julia tries to solve the **two language problem**. That is to achieve efficiency without vectorizing code.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What's Julia?\n", "\n", "> Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments\n", "\n", "* History:\n", " - Project started in 2009. First public release in 2012 \n", " - Creators: Jeff Bezanson, Alan Edelman, Stefan Karpinski, Viral Shah\n", " - First major release v1.0 was released on Aug 8, 2018\n", " - Current stable release v1.1.0\n", "\n", "* Aim to solve the notorious **two language problem**: Prototype code goes into high-level languages like R/Python, production code goes into low-level language like C/C++. \n", "\n", " Julia aims to:\n", "> Walks like Python. Runs like C.\n", "\n", "\n", "\n", "See