{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Packages\n", "\n", "Julia code is organized in packages, and package management is built into the Julia language.\n", "\n", "The assumption is that packages are developed with `git` and Julia will clone the whole repository when installing a package.\n", "\n", "Users can have their packages registered on a special GitHub repository: [METADATA.jl](https://github.com/JuliaLang/METADATA.jl). Dependencies are tracked in the `REQUIRE` file." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mUpdating METADATA...\n", "\u001b[39m\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mUpdating PipeLayout master...\n", "\u001b[39m\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mUpdating SCIP master...\n", "\u001b[39m\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mComputing changes...\n", "\u001b[39m" ] }, { "name": "stdout", "output_type": "stream", "text": [ "18 required packages:\n", " - BenchmarkTools 0.0.8\n", " - Cbc 0.3.2\n", " - Clp 0.3.1\n", " - Convex 0.5.0\n", " - DataFrames 0.10.0\n", " - GLPKMathProgInterface 0.3.4\n", " - IJulia 1.5.1\n", " - IndexedTables 0.1.7\n", " - JuMP 0.17.1\n", " - LightGraphs 0.9.0\n", " - Plots 0.12.0\n", " - ProfileView 0.2.1\n", " - PyPlot 2.3.2\n", " - Query 0.6.0\n", " - SCIP 0.3.0+ master\n", " - SCS 0.3.1\n", " - StatPlots 0.4.0\n", " - StaticArrays 0.5.1\n", "80 additional packages:\n", " - AxisAlgorithms 0.1.6\n", " - BaseTestNext 0.2.2\n", " - BinDeps 0.6.0\n", " - Blosc 0.2.1\n", " - Cairo 0.3.0\n", " - Calculus 0.2.2\n", " - ColorTypes 0.5.1\n", " - Colors 0.7.3\n", " - Combinatorics 0.4.1\n", " - Compat 0.26.0\n", " - Conda 0.5.3\n", " - DataArrays 0.5.3\n", " - DataStructures 0.5.3\n", " - DataValues 0.1.1\n", " - DiffBase 0.2.0\n", " - Distances 0.4.1\n", " - Distributions 0.13.0\n", " - DocStringExtensions 0.3.3\n", " - Documenter 0.11.1\n", " - DualNumbers 0.3.0\n", " - FileIO 0.4.1\n", " - FixedPointNumbers 0.3.8\n", " - FixedSizeArrays 0.2.5\n", " - ForwardDiff 0.4.2\n", " - GLPK 0.4.2\n", " - GZip 0.3.0\n", " - Graphics 0.2.0\n", " - Gtk 0.13.0\n", " - GtkReactive 0.2.1\n", " - HDF5 0.8.1\n", " - Interpolations 0.6.2\n", " - IntervalSets 0.0.5\n", " - IterTools 0.1.0\n", " - IterableTables 0.3.0\n", " - Iterators 0.3.1\n", " - JLD 0.6.11\n", " - JSON 0.12.0\n", " - KernelDensity 0.3.2\n", " - LaTeXStrings 0.2.1\n", " - Lazy 0.11.7\n", " - LegacyStrings 0.2.2\n", " - LineSearches 0.1.5\n", " - Loess 0.2.0\n", " - MacroTools 0.3.7\n", " - MathProgBase 0.6.4\n", " - MbedTLS 0.4.5\n", " - Measures 0.1.0\n", " - NaNMath 0.2.5\n", " - NamedTuples 4.0.0\n", " - Optim 0.7.8\n", " - PDMats 0.7.0\n", " - PipeLayout 0.0.0- master (unregistered)\n", " - PlotThemes 0.1.4\n", " - PlotUtils 0.4.2\n", " - Polynomials 0.1.5\n", " - PooledArrays 0.1.1\n", " - PositiveFactorizations 0.0.4\n", " - PyCall 1.13.0\n", " - QuadGK 0.1.2\n", " - Ratios 0.1.0\n", " - Reactive 0.5.2\n", " - RecipesBase 0.2.0\n", " - Reexport 0.0.3\n", " - Requires 0.4.3\n", " - ReverseDiffSparse 0.7.3\n", " - Rmath 0.1.7\n", " - RoundingIntegers 0.0.2\n", " - SHA 0.3.3\n", " - SIUnits 0.1.0\n", " - ShowItLikeYouBuildIt 0.0.1\n", " - Showoff 0.1.1\n", " - SimpleTraits 0.5.0\n", " - SortingAlgorithms 0.1.1\n", " - SpecialFunctions 0.1.1\n", " - StatsBase 0.16.0\n", " - StatsFuns 0.5.0\n", " - TexExtensions 0.0.3\n", " - URIParser 0.1.8\n", " - WoodburyMatrices 0.2.2\n", " - ZMQ 0.4.3\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mNo packages to install, update or remove\n", "\u001b[39m\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mPackage DataFrames is already installed\n", "\u001b[39m" ] } ], "source": [ "# update the local copy of METADATA\n", "Pkg.update()\n", "\n", "# install a registered package\n", "Pkg.add(\"DataFrames\")\n", "\n", "# install any other package\n", "#Pkg.clone(\"https://github.com/leethargo/PipeLayout.jl\")\n", "\n", "# checkout a branch of a package (default: master)\n", "#Pkg.checkout(\"PipeLayout\")\n", "\n", "# list installed packages with versions\n", "Pkg.status()" That is, we want to select few stocks of the index, together with weights, that show a similar behavior to the overall index.\n", "\n", "We start with price data of all the Dow Jones stocks from 2016. From the averages prices, we define weights of the stocks to be used" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading the price data\n", "\n", "The data is provided in a file using comma-separated values and three columns:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "date,symbol,price\n", "2016-01-04,AAPL,105.349997999999999\n", "2016-01-04,AXP,67.589995999999999\n", "2016-01-04,BA,140.500000000000000\n", "2016-01-04,CAT,67.989998000000000\n", "2016-01-04,CSCO,26.410000000000000\n", "2016-01-04,CVX,88.849997999999999\n", "2016-01-04,DD,63.070000000000000\n", "2016-01-04,DIS,102.980002999999996\n", "2016-01-04,GE,30.709999000000000\n" ] } ], "source": [ ";head dowjones2016.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Julia provides a function to read csv files into arrays:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "search: \u001b[1mr\u001b[22m\u001b[1me\u001b[22m\u001b[1ma\u001b[22m\u001b[1md\u001b[22m\u001b[1mc\u001b[22m\u001b[1ms\u001b[22m\u001b[1mv\u001b[22m \u001b[1mr\u001b[22m\u001b[1me\u001b[22m\u001b[1ma\u001b[22m\u001b[1md\u001b[22m\u001b[1mc\u001b[22mhomp @th\u001b[1mr\u001b[22m\u001b[1me\u001b[22m\u001b[1ma\u001b[22m\u001b[1md\u001b[22m\u001b[1mc\u001b[22mall\n", "\n" ] }, { "data": { "text/markdown": [ "```\n", "readcsv(source, [T::Type]; options...)\n", "```\n", "\n", "Equivalent to [`readdlm`](@ref) with `delim` set to comma, and type optionally defined by `T`.\n" ], "text/plain": [ "```\n", "readcsv(source, [T::Type]; options...)\n", "```\n", "\n", "Equivalent to [`readdlm`](@ref) with `delim` set to comma, and type optionally defined by `T`.\n" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?readcsv" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×3 Array{Any,2}:\n", " \"date\" \"symbol\" \"price\"\n", " \"2016-01-04\" \"AAPL\" 105.35 \n", " \"2016-01-04\" \"AXP\" 67.59 \n", " \"2016-01-04\" \"BA\" 140.5 \n", " \"2016-01-04\" \"CAT\" 67.99 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = readcsv(\"dowjones2016.csv\")\n", "data[1:5,:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But we will use the DataFrames package for easier processing." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "using DataFrames" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
│ Row │ date │ symbol │ price │
├─────┼──────────────┼────────┼────────┤
│ 1 │ "2016-01-04" │ "AAPL" │ 105.35 │
│ 2 │ "2016-01-04" │ "AXP" │ 67.59 │
│ 3 │ "2016-01-04" │ "BA" │ 140.5 │
│ 4 │ "2016-01-04" │ "CAT" │ 67.99 │
│ Row │ symbol │ avgprice │
├─────┼────────┼──────────┤
│ 1 │ "AAPL" │ 104.604 │
│ 2 │ "AXP" │ 63.7933 │
│ 3 │ "BA" │ 133.112 │
│ 4 │ "CAT" │ 78.698 │
│ Row │ symbol │ weight │
├─────┼────────┼───────────┤
│ 1 │ "AAPL" │ 0.0399597 │
│ 2 │ "AXP" │ 0.0243696 │
│ 3 │ "BA" │ 0.0508498 │
│ 4 │ "CAT" │ 0.0300634 │
│ Row │ date │ symbol │ price │
├─────┼──────────────┼────────┼────────┤
│ 1 │ "2016-01-04" │ "AAPL" │ 105.35 │
│ 2 │ "2016-01-04" │ "AXP" │ 67.59 │
│ 3 │ "2016-01-04" │ "BA" │ 140.5 │
│ 4 │ "2016-01-04" │ "CAT" │ 67.99 │
│ Row │ date │ AAPL │ AXP │ BA │
├─────┼──────────────┼────────┼───────┼────────┤
│ 1 │ "2016-01-04" │ 105.35 │ 67.59 │ 140.5 │
│ 2 │ "2016-01-05" │ 102.71 │ 66.55 │ 141.07 │
│ 3 │ "2016-01-06" │ 100.7 │ 64.42 │ 138.83 │
│ 4 │ "2016-01-07" │ 96.45 │ 63.84 │ 133.01 │
│ Row │ date │ symbol │ price │ weight │
├─────┼──────────────┼────────┼────────┼───────────┤
│ 1 │ "2016-01-04" │ "AAPL" │ 105.35 │ 0.0399597 │
│ 2 │ "2016-01-05" │ "AAPL" │ 102.71 │ 0.0399597 │
│ 3 │ "2016-01-06" │ "AAPL" │ 100.7 │ 0.0399597 │
│ 4 │ "2016-01-07" │ "AAPL" │ 96.45 │ 0.0399597 │
│ Row │ date │ symbol │ price │ weight │ contribution │
├─────┼──────────────┼────────┼────────┼───────────┼──────────────┤
│ 1 │ "2016-01-04" │ "AAPL" │ 105.35 │ 0.0399597 │ 4.20975 │
│ 2 │ "2016-01-05" │ "AAPL" │ 102.71 │ 0.0399597 │ 4.10426 │
│ 3 │ "2016-01-06" │ "AAPL" │ 100.7 │ 0.0399597 │ 4.02394 │
│ 4 │ "2016-01-07" │ "AAPL" │ 96.45 │ 0.0399597 │ 3.85411 │
│ Row │ date │ value │
├─────┼──────────────┼─────────┤
│ 1 │ "2016-01-04" │ 100.573 │
│ 2 │ "2016-01-05" │ 100.511 │
│ 3 │ "2016-01-06" │ 99.0142 │
│ 4 │ "2016-01-07" │ 96.606 │ I \\rVert_1 \\\\\n", "\\text{subject to} \\quad & \\lVert w \\rVert_0 \\le K\n", "\\end{align*}\n", "\n", "For the curious: this can be expressed as a [Mixed-Integer Linear Program](https://en.wikipedia.org/wiki/Integer_programming) in the following form:\n", "\n", "\\begin{align*}\n", "\\text{minimize} \\quad & \\sum_d \\Delta^+_d + \\Delta^-_d & \\\\\n", "\\text{subject to} \\quad & \\sum_s P_{d,s} w_s = I_d + \\Delta^+_d + \\Delta^-_d & (\\forall d) \\\\\n", " & w_s \\le p_s & (\\forall s) \\\\\n", " & \\sum_s p_s \\le K & \\\\\n", " & w_s \\ge 0, \\quad p_s \\in \\{0,1\\} & (\\forall s) \\\\\n", " & \\Delta^+_d \\ge 0, \\quad \\Delta^-_d \\ge 0 & (\\forall d)\n", "\\end{align*}\n", "\n", "Several Julia packages are devoted to this kind of optimization, such as [JuMP](https://github.com/JuliaOpt/JuMP.jl) and [Convex](https://github.com/JuliaOpt/Convex.jl) for modeling, solver backends like [Cbc](https://github.com/JuliaOpt/Cbc.jl) or [SCIP](https://github.com/SCIP-Interfaces/SCIP.jl) and [MathProgBase](https://github.com/JuliaOpt/MathProgBase.jl) as glue. See [JuliaOpt](http://www.juliaopt.org/) for an overview." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true }, "outputs": [], "source": [ "using JuMP # modeling\n", "using Cbc # solver backend" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "size(syms) = (30,)\n", "size(days) = (252,)\n" ] } ], "source": [ "# preparing data for indexing\n", "syms = [Symbol(s) for s in weights[:symbol]]\n", "days = 1:length(prices[:date])\n", "\n", "@show size(syms) size(days);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will formulate a model that should look quite close to the mathematical notation above.\n", "\n", "Note the heavy use of Julia macros to define variables and constraints. 