{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Multiple Regression\n", "\n", "This example shows how multiple regression can be performed using statsample and daru.\n", "\n", "The `lr()` shorthand will call the function Statsample::Regression.multiple. It should be\n", "noted that internally statsample implements multiple regression using either Ruby methods\n", "or GSL methods. This lets statsample run even in the absence of gsl. But ruby implementations\n", "of functions are much much slower than those from GSL, and hence it is recomended that you\n", "install the [rb-gsl](https://github.com/blackwinter/rb-gsl) or [gsl-nmatrix](https://github.com/v0dro/gsl-nmatrix) gems before proceeding (these will work only on MRI).\n", "\n", "Rb-gsl can be installed from rubygems directly with `gem install rb-gsl`. To see how to install\n", "gsl-nmatrix, see [this blog post](http://v0dro.github.io/blog/2015/05/12/making-statsample-work-with-rb-gsl-and-gsl-nmatrix/)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analysis 2015-06-04 00:17:46 +0530\n", "= Statsample::Regression::Multiple\n", " == Multiple reggresion of a,b,cc,d on y\n", " Engine: Statsample::Regression::Multiple::RubyEngine\n", " Cases(listwise)=2000(2000)\n", " R=0.987\n", " R^2=0.975\n", " R^2 Adj=0.975\n", " Std.Error R=1.028\n", " Equation=0.015 + 5.021a + 3.005b + 2.049cc + 1.024d\n", " === ANOVA\n", " ANOVA Table\n", "+------------+-----------+------+-----------+-----------+-------+\n", "| source | ss | df | ms | f | p |\n", "+------------+-----------+------+-----------+-----------+-------+\n", "| Regression | 81106.039 | 4 | 20276.510 | 19197.346 | 0.000 |\n", "| Error | 2107.147 | 1995 | 1.056 | | |\n", "| Total | 83213.187 | 1999 | 20277.566 | | |\n", "+------------+-----------+------+-----------+-----------+-------+\n", "\n", " Beta coefficients\n", "+----------+-------+-------+-------+---------+\n", "| coeff | b | beta | se | t |\n", "+----------+-------+-------+-------+---------+\n", "| Constant | 0.015 | - | 0.023 | 0.668 |\n", "| a | 5.021 | 0.779 | 0.023 | 218.235 |\n", "| b | 3.005 | 0.457 | 0.023 | 128.181 |\n", "| cc | 2.049 | 0.318 | 0.023 | 89.209 |\n", "| d | 1.024 | 0.156 | 0.023 | 43.778 |\n", "+----------+-------+-------+-------+---------+\n", "\n", "\n" ] } ], "source": [ "require 'statsample'\n", "\n", "Statsample::Analysis.store(Statsample::Regression::Multiple) do\n", " Daru.lazy_update = true\n", " \n", " samples=2000\n", " ds = Daru::DataFrame.new({\n", " :a => rnorm(samples),\n", " :b => rnorm(samples),\n", " :cc => rnorm(samples),\n", " :d => rnorm(samples)}, clone: false)\n", " attach(ds)\n", " ds[:y] = a*5+b*3+cc*2+d+rnorm(samples)\n", " \n", " # REMEMBER: It is _mandatory_ to call #update after assingnment cycles if your \n", " # operations to be performed as expected.\n", " ds.update\n", " summary lr(ds,:y)\n", " \n", " Daru.lazy_update = false\n", "end\n", "Statsample::Analysis.run_batch" ] } ], "metadata": { "kernelspec": { "display_name": "Ruby 2.2.1", "language": "ruby", "name": "ruby" }, "language_info": { "file_extension": "rb", "mimetype": "application/x-ruby", "name": "ruby", "version": "2.2.1" } }, "nbformat": 4, "nbformat_minor": 0 }