{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Packages\n", "\n", "Julia code is organized in packages, and package management is built into the Julia language.\n", "\n", "The assumption is that packages are developed with `git` and Julia will clone the whole repository when installing a package.\n", "\n", "Users can have their packages registered on a special GitHub repository: [METADATA.jl](https://github.com/JuliaLang/METADATA.jl). Dependencies are tracked in the `REQUIRE` file." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mUpdating METADATA...\n", "\u001b[39m\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mUpdating PipeLayout master...\n", "\u001b[39m\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mUpdating SCIP master...\n", "\u001b[39m\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mComputing changes...\n", "\u001b[39m" ] }, { "name": "stdout", "output_type": "stream", "text": [ "18 required packages:\n", " - BenchmarkTools 0.0.8\n", " - Cbc 0.3.2\n", " - Clp 0.3.1\n", " - Convex 0.5.0\n", " - DataFrames 0.10.0\n", " - GLPKMathProgInterface 0.3.4\n", " - IJulia 1.5.1\n", " - IndexedTables 0.1.7\n", " - JuMP 0.17.1\n", " - LightGraphs 0.9.0\n", " - Plots 0.12.0\n", " - ProfileView 0.2.1\n", " - PyPlot 2.3.2\n", " - Query 0.6.0\n", " - SCIP 0.3.0+ master\n", " - SCS 0.3.1\n", " - StatPlots 0.4.0\n", " - StaticArrays 0.5.1\n", "80 additional packages:\n", " - AxisAlgorithms 0.1.6\n", " - BaseTestNext 0.2.2\n", " - BinDeps 0.6.0\n", " - Blosc 0.2.1\n", " - Cairo 0.3.0\n", " - Calculus 0.2.2\n", " - ColorTypes 0.5.1\n", " - Colors 0.7.3\n", " - Combinatorics 0.4.1\n", " - Compat 0.26.0\n", " - Conda 0.5.3\n", " - DataArrays 0.5.3\n", " - DataStructures 0.5.3\n", " - DataValues 0.1.1\n", " - DiffBase 0.2.0\n", " - Distances 0.4.1\n", " - Distributions 0.13.0\n", " - DocStringExtensions 0.3.3\n", " - Documenter 0.11.1\n", " - DualNumbers 0.3.0\n", " - FileIO 0.4.1\n", " - FixedPointNumbers 0.3.8\n", " - FixedSizeArrays 0.2.5\n", " - ForwardDiff 0.4.2\n", " - GLPK 0.4.2\n", " - GZip 0.3.0\n", " - Graphics 0.2.0\n", " - Gtk 0.13.0\n", " - GtkReactive 0.2.1\n", " - HDF5 0.8.1\n", " - Interpolations 0.6.2\n", " - IntervalSets 0.0.5\n", " - IterTools 0.1.0\n", " - IterableTables 0.3.0\n", " - Iterators 0.3.1\n", " - JLD 0.6.11\n", " - JSON 0.12.0\n", " - KernelDensity 0.3.2\n", " - LaTeXStrings 0.2.1\n", " - Lazy 0.11.7\n", " - LegacyStrings 0.2.2\n", " - LineSearches 0.1.5\n", " - Loess 0.2.0\n", " - MacroTools 0.3.7\n", " - MathProgBase 0.6.4\n", " - MbedTLS 0.4.5\n", " - Measures 0.1.0\n", " - NaNMath 0.2.5\n", " - NamedTuples 4.0.0\n", " - Optim 0.7.8\n", " - PDMats 0.7.0\n", " - PipeLayout 0.0.0- master (unregistered)\n", " - PlotThemes 0.1.4\n", " - PlotUtils 0.4.2\n", " - Polynomials 0.1.5\n", " - PooledArrays 0.1.1\n", " - PositiveFactorizations 0.0.4\n", " - PyCall 1.13.0\n", " - QuadGK 0.1.2\n", " - Ratios 0.1.0\n", " - Reactive 0.5.2\n", " - RecipesBase 0.2.0\n", " - Reexport 0.0.3\n", " - Requires 0.4.3\n", " - ReverseDiffSparse 0.7.3\n", " - Rmath 0.1.7\n", " - RoundingIntegers 0.0.2\n", " - SHA 0.3.3\n", " - SIUnits 0.1.0\n", " - ShowItLikeYouBuildIt 0.0.1\n", " - Showoff 0.1.1\n", " - SimpleTraits 0.5.0\n", " - SortingAlgorithms 0.1.1\n", " - SpecialFunctions 0.1.1\n", " - StatsBase 0.16.0\n", " - StatsFuns 0.5.0\n", " - TexExtensions 0.0.3\n", " - URIParser 0.1.8\n", " - WoodburyMatrices 0.2.2\n", " - ZMQ 0.4.3\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mNo packages to install, update or remove\n", "\u001b[39m\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mPackage DataFrames is already installed\n", "\u001b[39m" ] } ], "source": [ "# update the local copy of METADATA\n", "Pkg.update()\n", "\n", "# install a registered package\n", "Pkg.add(\"DataFrames\")\n", "\n", "# install any other package\n", "#Pkg.clone(\"https://github.com/leethargo/PipeLayout.jl\")\n", "\n", "# checkout a branch of a package (default: master)\n", "#Pkg.checkout(\"PipeLayout\")\n", "\n", "# list installed packages with versions\n", "Pkg.status()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating an index fund\n", "\n", "The goal of this project is the definition of an index fund, following the Dow Jones. That is, we want to select few stocks of the index, together with weights, that show a similar behavior to the overall index.\n", "\n", "We start with price data of all the Dow Jones stocks from 2016. From the averages prices, we define weights of the stocks to be used" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading the price data\n", "\n", "The data is provided in a file using comma-separated values and three columns:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "date,symbol,price\n", "2016-01-04,AAPL,105.349997999999999\n", "2016-01-04,AXP,67.589995999999999\n", "2016-01-04,BA,140.500000000000000\n", "2016-01-04,CAT,67.989998000000000\n", "2016-01-04,CSCO,26.410000000000000\n", "2016-01-04,CVX,88.849997999999999\n", "2016-01-04,DD,63.070000000000000\n", "2016-01-04,DIS,102.980002999999996\n", "2016-01-04,GE,30.709999000000000\n" ] } ], "source": [ ";head dowjones2016.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Julia provides a function to read csv files into arrays:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "search: \u001b[1mr\u001b[22m\u001b[1me\u001b[22m\u001b[1ma\u001b[22m\u001b[1md\u001b[22m\u001b[1mc\u001b[22m\u001b[1ms\u001b[22m\u001b[1mv\u001b[22m \u001b[1mr\u001b[22m\u001b[1me\u001b[22m\u001b[1ma\u001b[22m\u001b[1md\u001b[22m\u001b[1mc\u001b[22mhomp @th\u001b[1mr\u001b[22m\u001b[1me\u001b[22m\u001b[1ma\u001b[22m\u001b[1md\u001b[22m\u001b[1mc\u001b[22mall\n", "\n" ] }, { "data": { "text/markdown": [ "```\n", "readcsv(source, [T::Type]; options...)\n", "```\n", "\n", "Equivalent to [`readdlm`](@ref) with `delim` set to comma, and type optionally defined by `T`.\n" ], "text/plain": [ "```\n", "readcsv(source, [T::Type]; options...)\n", "```\n", "\n", "Equivalent to [`readdlm`](@ref) with `delim` set to comma, and type optionally defined by `T`.\n" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "?readcsv" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×3 Array{Any,2}:\n", " \"date\" \"symbol\" \"price\"\n", " \"2016-01-04\" \"AAPL\" 105.35 \n", " \"2016-01-04\" \"AXP\" 67.59 \n", " \"2016-01-04\" \"BA\" 140.5 \n", " \"2016-01-04\" \"CAT\" 67.99 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = readcsv(\"dowjones2016.csv\")\n", "data[1:5,:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But we will use the DataFrames package for easier processing." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "using DataFrames" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
datesymbolprice
12016-01-04AAPL105.349998
22016-01-04AXP67.589996
32016-01-04BA140.5
42016-01-04CAT67.989998
" ], "text/plain": [ "4×3 DataFrames.DataFrame\n", "│ Row │ date │ symbol │ price │\n", "├─────┼──────────────┼────────┼────────┤\n", "│ 1 │ \"2016-01-04\" │ \"AAPL\" │ 105.35 │\n", "│ 2 │ \"2016-01-04\" │ \"AXP\" │ 67.59 │\n", "│ 3 │ \"2016-01-04\" │ \"BA\" │ 140.5 │\n", "│ 4 │ \"2016-01-04\" │ \"CAT\" │ 67.99 │" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = readtable(\"dowjones2016.csv\")\n", "df[1:4, :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now access the columns by name:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7560-element DataArrays.DataArray{Float64,1}:\n", " 105.35\n", " 67.59\n", " 140.5 \n", " 67.99\n", " 26.41\n", " 88.85\n", " 63.07\n", " 102.98\n", " 30.71\n", " 177.14\n", " 131.07\n", " 135.95\n", " 33.99\n", " ⋮ \n", " 58.87\n", " 62.14\n", " 50.83\n", " 32.48\n", " 84.08\n", " 122.42\n", " 160.04\n", " 109.62\n", " 78.02\n", " 53.38\n", " 69.12\n", " 90.26" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[:price]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's compute mean prices for the stocks, using a groupby-and-aggregate approach." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
symbolavgprice
1AAPL104.6040078690476
2AXP63.79333337698412
3BA133.11150809920633
4CAT78.69801573015873
" ], "text/plain": [ "4×2 DataFrames.DataFrame\n", "│ Row │ symbol │ avgprice │\n", "├─────┼────────┼──────────┤\n", "│ 1 │ \"AAPL\" │ 104.604 │\n", "│ 2 │ \"AXP\" │ 63.7933 │\n", "│ 3 │ \"BA\" │ 133.112 │\n", "│ 4 │ \"CAT\" │ 78.698 │" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "avg = by(df, :symbol, d -> DataFrame(avgprice = mean(d[:price])))\n", "avg[1:4, :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now use these averages to compute weights." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
symbolweight
1AAPL0.03995967436333611
2AXP0.02436962866171713
3BA0.05084979654236352
4CAT0.030063351736529197
" ], "text/plain": [ "4×2 DataFrames.DataFrame\n", "│ Row │ symbol │ weight │\n", "├─────┼────────┼───────────┤\n", "│ 1 │ \"AAPL\" │ 0.0399597 │\n", "│ 2 │ \"AXP\" │ 0.0243696 │\n", "│ 3 │ \"BA\" │ 0.0508498 │\n", "│ 4 │ \"CAT\" │ 0.0300634 │" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weights = DataFrame(symbol = avg[:symbol],\n", " weight = avg[:avgprice] / sum(avg[:avgprice]))\n", "weights[1:4, :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also _pivot_ the table into a two-way format." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
datesymbolprice
12016-01-04AAPL105.349998
22016-01-04AXP67.589996
32016-01-04BA140.5
42016-01-04CAT67.989998
" ], "text/plain": [ "4×3 DataFrames.DataFrame\n", "│ Row │ date │ symbol │ price │\n", "├─────┼──────────────┼────────┼────────┤\n", "│ 1 │ \"2016-01-04\" │ \"AAPL\" │ 105.35 │\n", "│ 2 │ \"2016-01-04\" │ \"AXP\" │ 67.59 │\n", "│ 3 │ \"2016-01-04\" │ \"BA\" │ 140.5 │\n", "│ 4 │ \"2016-01-04\" │ \"CAT\" │ 67.99 │" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# original dataframe\n", "df[1:4, :]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
dateAAPLAXPBA
12016-01-04105.34999867.589996140.5
22016-01-05102.70999966.550003141.070007
32016-01-06100.69999764.419998138.830002
42016-01-0796.44999763.84133.009995
" ], "text/plain": [ "4×4 DataFrames.DataFrame\n", "│ Row │ date │ AAPL │ AXP │ BA │\n", "├─────┼──────────────┼────────┼───────┼────────┤\n", "│ 1 │ \"2016-01-04\" │ 105.35 │ 67.59 │ 140.5 │\n", "│ 2 │ \"2016-01-05\" │ 102.71 │ 66.55 │ 141.07 │\n", "│ 3 │ \"2016-01-06\" │ 100.7 │ 64.42 │ 138.83 │\n", "│ 4 │ \"2016-01-07\" │ 96.45 │ 63.84 │ 133.01 │" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# two-way table with symbols as columns\n", "# rows columns data\n", "prices = unstack(df, :date, :symbol, :price)\n", "prices[1:4, 1:4]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
datesymbolpriceweight
12016-01-04AAPL105.3499980.03995967436333611
22016-01-05AAPL102.7099990.03995967436333611
32016-01-06AAPL100.6999970.03995967436333611
42016-01-07AAPL96.4499970.03995967436333611
" ], "text/plain": [ "4×4 DataFrames.DataFrame\n", "│ Row │ date │ symbol │ price │ weight │\n", "├─────┼──────────────┼────────┼────────┼───────────┤\n", "│ 1 │ \"2016-01-04\" │ \"AAPL\" │ 105.35 │ 0.0399597 │\n", "│ 2 │ \"2016-01-05\" │ \"AAPL\" │ 102.71 │ 0.0399597 │\n", "│ 3 │ \"2016-01-06\" │ \"AAPL\" │ 100.7 │ 0.0399597 │\n", "│ 4 │ \"2016-01-07\" │ \"AAPL\" │ 96.45 │ 0.0399597 │" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "joined = join(df, weights, on=:symbol)\n", "joined[1:4, :]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
datesymbolpriceweightcontribution
12016-01-04AAPL105.3499980.039959674363336114.209751614258111
22016-01-05AAPL102.7099990.039959674363336114.104258113898577
32016-01-06AAPL100.6999970.039959674363336114.023939088508923
42016-01-07AAPL96.4499970.039959674363336113.8541104724647446
" ], "text/plain": [ "4×5 DataFrames.DataFrame\n", "│ Row │ date │ symbol │ price │ weight │ contribution │\n", "├─────┼──────────────┼────────┼────────┼───────────┼──────────────┤\n", "│ 1 │ \"2016-01-04\" │ \"AAPL\" │ 105.35 │ 0.0399597 │ 4.20975 │\n", "│ 2 │ \"2016-01-05\" │ \"AAPL\" │ 102.71 │ 0.0399597 │ 4.10426 │\n", "│ 3 │ \"2016-01-06\" │ \"AAPL\" │ 100.7 │ 0.0399597 │ 4.02394 │\n", "│ 4 │ \"2016-01-07\" │ \"AAPL\" │ 96.45 │ 0.0399597 │ 3.85411 │" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "joined[:contribution] = joined[:weight] .* joined[:price]\n", "joined[1:4, :]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
datevalue
12016-01-04100.57292879489896
22016-01-05100.51142239490156
32016-01-0699.01420719993507
42016-01-0796.60603263325876
" ], "text/plain": [ "4×2 DataFrames.DataFrame\n", "│ Row │ date │ value │\n", "├─────┼──────────────┼─────────┤\n", "│ 1 │ \"2016-01-04\" │ 100.573 │\n", "│ 2 │ \"2016-01-05\" │ 100.511 │\n", "│ 3 │ \"2016-01-06\" │ 99.0142 │\n", "│ 4 │ \"2016-01-07\" │ 96.606 │" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "index = by(joined, :date, d -> DataFrame(value = sum(d[:contribution])))\n", "index[1:4, :]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Visualization the time series" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Plots.PyPlotBackend()" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Plots # general plotting\n", "pyplot() # backend, based on Python's matplotlib" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10×3 Array{Float64,2}:\n", " 0.0967222 1.13685 -0.221413\n", " 2.63506 -0.836293 0.568922\n", " 1.97871 0.380401 0.673628\n", " 0.862146 1.3827 -1.16359 \n", " 0.318709 1.72083 0.381898\n", " 2.27183 1.32943 -0.366597\n", " 1.48305 0.657708 1.07946 \n", " 1.82443 0.120829 -0.108935\n", " 1.97516 -0.857797 -0.854739\n", " 1.2446 -0.0925 -1.0925 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = cumsum(randn(10, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plots will interprete the *columns* of the data as *series* to be plotted independently:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot(x)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot(x')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also add to existing plots, using the call `plot!`." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot(x, color=[:red :green])\n", "plot!(x + 3, color=:black, alpha=0.5)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "using StatPlots # for DataFrames integration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can set common attributes for several plots using the `with` wrapper:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with(grid=false, legend=false, xticks=false, ylim=(0,300)) do\n", " plot(df, :date, :price, group=:symbol, color=:grey, alpha=0.4)\n", " plot!(index, :date, :value, linewidth=2)\n", "end" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bar(weights, :symbol, :weight, xrotation=50, color=:weight, grid=false)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Picking stocks\n", "\n", "We know come to the decision problem, where we want to pick a small subset of the stocks together with some weights, such that this portfolio has a similar behavior to our overall Dow Jones index.\n", "\n", "The model is based on a linear regression over the time series, but we minimize the loss using the L1-norm (absolute value), and allow only a fixed number of weights to take nonzero variable.\n", "\n", "A high-level mathematical model might look like this ($w$: weights, $P$: prices, $I$: value of index):\n", "\n", "\\begin{align*}\n", "\\text{minimize} \\quad & \\lVert w^T P - I \\rVert_1 \\\\\n", "\\text{subject to} \\quad & \\lVert w \\rVert_0 \\le K\n", "\\end{align*}\n", "\n", "For the curious: this can be expressed as a [Mixed-Integer Linear Program](https://en.wikipedia.org/wiki/Integer_programming) in the following form:\n", "\n", "\\begin{align*}\n", "\\text{minimize} \\quad & \\sum_d \\Delta^+_d + \\Delta^-_d & \\\\\n", "\\text{subject to} \\quad & \\sum_s P_{d,s} w_s = I_d + \\Delta^+_d + \\Delta^-_d & (\\forall d) \\\\\n", " & w_s \\le p_s & (\\forall s) \\\\\n", " & \\sum_s p_s \\le K & \\\\\n", " & w_s \\ge 0, \\quad p_s \\in \\{0,1\\} & (\\forall s) \\\\\n", " & \\Delta^+_d \\ge 0, \\quad \\Delta^-_d \\ge 0 & (\\forall d)\n", "\\end{align*}\n", "\n", "Several Julia packages are devoted to this kind of optimization, such as [JuMP](https://github.com/JuliaOpt/JuMP.jl) and [Convex](https://github.com/JuliaOpt/Convex.jl) for modeling, solver backends like [Cbc](https://github.com/JuliaOpt/Cbc.jl) or [SCIP](https://github.com/SCIP-Interfaces/SCIP.jl) and [MathProgBase](https://github.com/JuliaOpt/MathProgBase.jl) as glue. See [JuliaOpt](http://www.juliaopt.org/) for an overview." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true }, "outputs": [], "source": [ "using JuMP # modeling\n", "using Cbc # solver backend" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "size(syms) = (30,)\n", "size(days) = (252,)\n" ] } ], "source": [ "# preparing data for indexing\n", "syms = [Symbol(s) for s in weights[:symbol]]\n", "days = 1:length(prices[:date])\n", "\n", "@show size(syms) size(days);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will formulate a model that should look quite close to the mathematical notation above.\n", "\n", "Note the heavy use of Julia macros to define variables and constraints. The expressions are used as parsed by the Julia language and directly translated to the solver's internal form." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "find_fund (generic function with 1 method)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function find_fund(maxstocks; timelimit=10.0, gaplimit=0.01, lastday=200)\n", " days = 1:lastday\n", "\n", " fund = Model(solver=CbcSolver(seconds=timelimit, ratioGap=gaplimit))\n", "\n", " # decisions\n", " @variable(fund, pick[syms], Bin) # is stock included?\n", " @variable(fund, weight[syms] ≥ 0) # what part of the portfolio\n", "\n", " # auxiliary variables\n", " @variable(fund, Δ⁺[days] ≥ 0) # positive slack\n", " @variable(fund, Δ⁻[days] ≥ 0) # negative slack\n", "\n", " # fit to Dow Jones index\n", " for d in days\n", " @constraint(fund, sum(prices[d,s] * weight[s] for s in syms) == index[d, :value] + Δ⁺[d] - Δ⁻[d])\n", " end\n", "\n", " # can only use stock if picked\n", " for s in syms\n", " @constraint(fund, weight[s] ≤ pick[s])\n", " end\n", " \n", " # few stocks allowed\n", " @constraint(fund, sum(pick[s] for s in syms) ≤ maxstocks)\n", " \n", " # minimize the absolute violation (L1 norm)\n", " @objective(fund, :Min, sum(Δ⁺[d] + Δ⁻[d] for d in days))\n", " \n", " \n", " status = solve(fund)\n", " @show status\n", " \n", " getvalue(weight)\n", "end" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[33mWARNING: \u001b[39m\u001b[22m\u001b[33mNot solved to optimality, status: UserLimit\u001b[39m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "status = :UserLimit\n" ] }, { "data": { "text/plain": [ "weight: 1 dimensions:\n", "[AAPL] = 0.0\n", "[ AXP] = 0.47229506948301675\n", "[ BA] = 0.0\n", "[ CAT] = 0.0\n", "[CSCO] = 0.0\n", "[ CVX] = 0.0\n", "[ DD] = 0.0\n", "[ DIS] = 0.0\n", "[ GE] = 0.0\n", "[ GS] = 0.0\n", "[ HD] = 0.0\n", "[ IBM] = 0.0\n", "[INTC] = 0.0\n", "[ JNJ] = 0.0\n", "[ JPM] = 0.0\n", "[ KO] = 0.0\n", "[ MCD] = 0.0\n", "[ MMM] = 0.31629226616166206\n", "[ MRK] = 0.0\n", "[MSFT] = 0.4061951417784263\n", "[ NKE] = 0.0\n", "[ PFE] = 0.0\n", "[ PG] = 0.0\n", "[ TRV] = 0.0\n", "[ UNH] = 0.0\n", "[ UTX] = 0.0\n", "[ V] = 0.0\n", "[ VZ] = 0.0\n", "[ WMT] = 0.0\n", "[ XOM] = 0.0" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trainingdays = 100\n", "sol = find_fund(3, timelimit=6, lastday=trainingdays)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "solfund = sum(sol[s] * prices[:, s] for s in syms);" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with(xticks=[0, trainingdays, length(days)], yticks=[]) do\n", " plot(index, :date, :value, label=\"Dow Jones\")\n", " plot!(solfund, label=\"Index Fund\")\n", "end" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "errors = abs.(index[:value] - solfund)\n", "\n", "with(bins=20) do\n", " histogram(errors[trainingdays:252], label=\"later\", color=:red)\n", " histogram!(errors[1:trainingdays], alpha=0.8, label=\"training\", color=:green)\n", "end" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Julia 0.6.0", "language": "julia", "name": "julia-0.6" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }