{ "metadata": { "name": "preliminaries" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Learn More and Get Help" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Documentation: http://statsmodels.sf.net\n", "\n", "Mailing List: http://groups.google.com/group/pystatsmodels\n", "\n", "Use the source: https://github.com/statsmodels/statsmodels" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Tutorial Import Assumptions" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np\n", "import statsmodels.api as sm\n", "import matplotlib.pyplot as plt\n", "import pandas\n", "from scipy import stats\n", "\n", "np.set_printoptions(precision=4, suppress=True)\n", "pandas.set_printoptions(notebook_repr_html=False,\n", " precision=4,\n", " max_columns=12)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Statsmodels Import Convention" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import statsmodels.api as sm" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import convention for models for which a formula is available." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from statsmodels.formula.api import ols, rlm, glm, #etc." ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Package Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Regression models in statsmodels.regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Discrete choice models in statsmodels.discrete" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Robust linear models in statsmodels.robust" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generalized linear models in statsmodels.genmod" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Time Series Analysis in statsmodels.tsa" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nonparametric models in statsmodels.nonparametric" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plotting functions in statsmodels.graphics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Input/Output in statsmodels.iolib (Foreign data, ascii, HTML, $\\LaTeX$ tables)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Statistical tests, ANOVA in statsmodels.stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Datasets in statsmodels.datasets (See also the new GPL package Rdatasets: https://github.com/vincentarelbundock/Rdatasets)" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Base Classes" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from statsmodels.base import model" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "help(model.Model)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Help on class Model in module statsmodels.base.model:\n", "\n", "class Model(__builtin__.object)\n", " | A (predictive) statistical model. The class Model itself is not to be used.\n", " | \n", " | Model lays out the methods expected of any subclass.\n", " | \n", " | Parameters\n", " | ----------\n", " | endog : array-like\n", " | Endogenous response variable.\n", " | exog : array-like\n", " | Exogenous design.\n", " | \n", " | Notes\n", " | -----\n", " | `endog` and `exog` are references to any data provided. So if the data is\n", " | already stored in numpy arrays and it is changed then `endog` and `exog`\n", " | will change as well.\n", " | \n", " | Methods defined here:\n", " | \n", " | __init__(self, endog, exog=None)\n", " | \n", " | fit(self)\n", " | Fit a model to data.\n", " | \n", " | predict(self, params, exog=None, *args, **kwargs)\n", " | After a model has been fit predict returns the fitted values.\n", " | \n", " | This is a placeholder intended to be overwritten by individual models.\n", " | \n", " | ----------------------------------------------------------------------\n", " | Class methods defined here:\n", " | \n", " | from_formula(cls, formula, df, subset=None, *args, **kwargs) from __builtin__.type\n", " | Create a Model from a formula and dataframe.\n", " | \n", " | Parameters\n", " | ----------\n", " | formula : str or generic Formula object\n", " | The formula specifying the model\n", " | df : array-like\n", " | The data for the model. See Notes.\n", " | subset : array-like\n", " | An array-like object of booleans, integers, or index values that\n", " | indicate the subset of df to use in the model. Assumes df is a\n", " | `pandas.DataFrame`\n", " | args : extra arguments\n", " | These are passed to the model\n", " | kwargs : extra keyword arguments\n", " | These are passed to the model.\n", " | \n", " | Returns\n", " | -------\n", " | model : Model instance\n", " | \n", " | Notes\n", " | ------\n", " | df must define __getitem__ with the keys in the formula terms\n", " | args and kwargs are passed on to the model instantiation. E.g.,\n", " | a numpy structured or rec array, a dictionary, or a pandas DataFrame.\n", " | \n", " | ----------------------------------------------------------------------\n", " | Data descriptors defined here:\n", " | \n", " | __dict__\n", " | dictionary for instance variables (if defined)\n", " | \n", " | __weakref__\n", " | list of weak references to the object (if defined)\n", " | \n", " | endog_names\n", " | \n", " | exog_names\n", "\n" ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "help(model.LikelihoodModel)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Help on class LikelihoodModel in module statsmodels.base.model:\n", "\n", "class LikelihoodModel(Model)\n", " | Likelihood model is a subclass of Model.\n", " | \n", " | Method resolution order:\n", " | LikelihoodModel\n", " | Model\n", " | __builtin__.object\n", " | \n", " | Methods defined here:\n", " | \n", " | __init__(self, endog, exog=None)\n", " | \n", " | fit(self, start_params=None, method='newton', maxiter=100, full_output=True, disp=True, fargs=(), callback=None, retall=False, **kwargs)\n", " | Fit method for likelihood based models\n", " | \n", " | Parameters\n", " | ----------\n", " | start_params : array-like, optional\n", " | Initial guess of the solution for the loglikelihood maximization.\n", " | The default is an array of zeros.\n", " | method : str {'newton','nm','bfgs','powell','cg', or 'ncg'}\n", " | Method can be 'newton' for Newton-Raphson, 'nm' for Nelder-Mead,\n", " | 'bfgs' for Broyden-Fletcher-Goldfarb-Shanno, 'powell' for modified\n", " | Powell's method, 'cg' for conjugate gradient, or 'ncg' for Newton-\n", " | conjugate gradient. `method` determines which solver from\n", " | scipy.optimize is used. The explicit arguments in `fit` are passed\n", " | to the solver. Each solver has several optional arguments that are\n", " | not the same across solvers. See the notes section below (or\n", " | scipy.optimize) for the available arguments.\n", " | maxiter : int\n", " | The maximum number of iterations to perform.\n", " | full_output : bool\n", " | Set to True to have all available output in the Results object's\n", " | mle_retvals attribute. The output is dependent on the solver.\n", " | See LikelihoodModelResults notes section for more information.\n", " | disp : bool\n", " | Set to True to print convergence messages.\n", " | fargs : tuple\n", " | Extra arguments passed to the likelihood function, i.e.,\n", " | loglike(x,*args)\n", " | callback : callable callback(xk)\n", " | Called after each iteration, as callback(xk), where xk is the\n", " | current parameter vector.\n", " | retall : bool\n", " | Set to True to return list of solutions at each iteration.\n", " | Available in Results object's mle_retvals attribute.\n", " | \n", " | Notes\n", " | -----\n", " | Optional arguments for the solvers (available in Results.mle_settings):\n", " | \n", " | 'newton'\n", " | tol : float\n", " | Relative error in params acceptable for convergence.\n", " | 'nm' -- Nelder Mead\n", " | xtol : float\n", " | Relative error in params acceptable for convergence\n", " | ftol : float\n", " | Relative error in loglike(params) acceptable for\n", " | convergence\n", " | maxfun : int\n", " | Maximum number of function evaluations to make.\n", " | 'bfgs'\n", " | gtol : float\n", " | Stop when norm of gradient is less than gtol.\n", " | norm : float\n", " | Order of norm (np.Inf is max, -np.Inf is min)\n", " | epsilon\n", " | If fprime is approximated, use this value for the step\n", " | size. Only relevant if LikelihoodModel.score is None.\n", " | 'cg'\n", " | gtol : float\n", " | Stop when norm of gradient is less than gtol.\n", " | norm : float\n", " | Order of norm (np.Inf is max, -np.Inf is min)\n", " | epsilon : float\n", " | If fprime is approximated, use this value for the step\n", " | size. Can be scalar or vector. Only relevant if\n", " | Likelihoodmodel.score is None.\n", " | 'ncg'\n", " | fhess_p : callable f'(x,*args)\n", " | Function which computes the Hessian of f times an arbitrary\n", " | vector, p. Should only be supplied if\n", " | LikelihoodModel.hessian is None.\n", " | avextol : float\n", " | Stop when the average relative error in the minimizer\n", " | falls below this amount.\n", " | epsilon : float or ndarray\n", " | If fhess is approximated, use this value for the step size.\n", " | Only relevant if Likelihoodmodel.hessian is None.\n", " | 'powell'\n", " | xtol : float\n", " | Line-search error tolerance\n", " | ftol : float\n", " | Relative error in loglike(params) for acceptable for\n", " | convergence.\n", " | maxfun : int\n", " | Maximum number of function evaluations to make.\n", " | start_direc : ndarray\n", " | Initial direction set.\n", " | \n", " | hessian(self, params)\n", " | The Hessian matrix of the model\n", " | \n", " | information(self, params)\n", " | Fisher information matrix of model\n", " | \n", " | Returns -Hessian of loglike evaluated at params.\n", " | \n", " | initialize(self)\n", " | Initialize (possibly re-initialize) a Model instance. For\n", " | instance, the design matrix of a linear model may change\n", " | and some things must be recomputed.\n", " | \n", " | loglike(self, params)\n", " | Log-likelihood of model.\n", " | \n", " | score(self, params)\n", " | Score vector of model.\n", " | \n", " | The gradient of logL with respect to each parameter.\n", " | \n", " | ----------------------------------------------------------------------\n", " | Methods inherited from Model:\n", " | \n", " | predict(self, params, exog=None, *args, **kwargs)\n", " | After a model has been fit predict returns the fitted values.\n", " | \n", " | This is a placeholder intended to be overwritten by individual models.\n", " | \n", " | ----------------------------------------------------------------------\n", " | Class methods inherited from Model:\n", " | \n", " | from_formula(cls, formula, df, subset=None, *args, **kwargs) from __builtin__.type\n", " | Create a Model from a formula and dataframe.\n", " | \n", " | Parameters\n", " | ----------\n", " | formula : str or generic Formula object\n", " | The formula specifying the model\n", " | df : array-like\n", " | The data for the model. See Notes.\n", " | subset : array-like\n", " | An array-like object of booleans, integers, or index values that\n", " | indicate the subset of df to use in the model. Assumes df is a\n", " | `pandas.DataFrame`\n", " | args : extra arguments\n", " | These are passed to the model\n", " | kwargs : extra keyword arguments\n", " | These are passed to the model.\n", " | \n", " | Returns\n", " | -------\n", " | model : Model instance\n", " | \n", " | Notes\n", " | ------\n", " | df must define __getitem__ with the keys in the formula terms\n", " | args and kwargs are passed on to the model instantiation. E.g.,\n", " | a numpy structured or rec array, a dictionary, or a pandas DataFrame.\n", " | \n", " | ----------------------------------------------------------------------\n", " | Data descriptors inherited from Model:\n", " | \n", " | __dict__\n", " | dictionary for instance variables (if defined)\n", " | \n", " | __weakref__\n", " | list of weak references to the object (if defined)\n", " | \n", " | endog_names\n", " | \n", " | exog_names\n", "\n" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "help(model.LikelihoodModelResults)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Help on class LikelihoodModelResults in module statsmodels.base.model:\n", "\n", "class LikelihoodModelResults(Results)\n", " | Class to contain results from likelihood models\n", " | \n", " | Parameters\n", " | -----------\n", " | model : LikelihoodModel instance or subclass instance\n", " | LikelihoodModelResults holds a reference to the model that is fit.\n", " | params : 1d array_like\n", " | parameter estimates from estimated model\n", " | normalized_cov_params : 2d array\n", " | Normalized (before scaling) covariance of params. (dot(X.T,X))**-1\n", " | scale : float\n", " | For (some subset of models) scale will typically be the\n", " | mean square error from the estimated model (sigma^2)\n", " | \n", " | Returns\n", " | -------\n", " | **Attributes**\n", " | mle_retvals : dict\n", " | Contains the values returned from the chosen optimization method if\n", " | full_output is True during the fit. Available only if the model\n", " | is fit by maximum likelihood. See notes below for the output from\n", " | the different methods.\n", " | mle_settings : dict\n", " | Contains the arguments passed to the chosen optimization method.\n", " | Available if the model is fit by maximum likelihood. See\n", " | LikelihoodModel.fit for more information.\n", " | model : model instance\n", " | LikelihoodResults contains a reference to the model that is fit.\n", " | params : ndarray\n", " | The parameters estimated for the model.\n", " | scale : float\n", " | The scaling factor of the model given during instantiation.\n", " | tvalues : array\n", " | The t-values of the standard errors.\n", " | \n", " | \n", " | Notes\n", " | --------\n", " | The covariance of params is given by scale times normalized_cov_params.\n", " | \n", " | Return values by solver if full_ouput is True during fit:\n", " | \n", " | 'newton'\n", " | fopt : float\n", " | The value of the (negative) loglikelihood at its\n", " | minimum.\n", " | iterations : int\n", " | Number of iterations performed.\n", " | score : ndarray\n", " | The score vector at the optimum.\n", " | Hessian : ndarray\n", " | The Hessian at the optimum.\n", " | warnflag : int\n", " | 1 if maxiter is exceeded. 0 if successful convergence.\n", " | converged : bool\n", " | True: converged. False: did not converge.\n", " | allvecs : list\n", " | List of solutions at each iteration.\n", " | 'nm'\n", " | fopt : float\n", " | The value of the (negative) loglikelihood at its\n", " | minimum.\n", " | iterations : int\n", " | Number of iterations performed.\n", " | warnflag : int\n", " | 1: Maximum number of function evaluations made.\n", " | 2: Maximum number of iterations reached.\n", " | converged : bool\n", " | True: converged. False: did not converge.\n", " | allvecs : list\n", " | List of solutions at each iteration.\n", " | 'bfgs'\n", " | fopt : float\n", " | Value of the (negative) loglikelihood at its minimum.\n", " | gopt : float\n", " | Value of gradient at minimum, which should be near 0.\n", " | Hinv : ndarray\n", " | value of the inverse Hessian matrix at minimum. Note\n", " | that this is just an approximation and will often be\n", " | different from the value of the analytic Hessian.\n", " | fcalls : int\n", " | Number of calls to loglike.\n", " | gcalls : int\n", " | Number of calls to gradient/score.\n", " | warnflag : int\n", " | 1: Maximum number of iterations exceeded. 2: Gradient\n", " | and/or function calls are not changing.\n", " | converged : bool\n", " | True: converged. False: did not converge.\n", " | allvecs : list\n", " | Results at each iteration.\n", " | 'powell'\n", " | fopt : float\n", " | Value of the (negative) loglikelihood at its minimum.\n", " | direc : ndarray\n", " | Current direction set.\n", " | iterations : int\n", " | Number of iterations performed.\n", " | fcalls : int\n", " | Number of calls to loglike.\n", " | warnflag : int\n", " | 1: Maximum number of function evaluations. 2: Maximum number\n", " | of iterations.\n", " | converged : bool\n", " | True : converged. False: did not converge.\n", " | allvecs : list\n", " | Results at each iteration.\n", " | 'cg'\n", " | fopt : float\n", " | Value of the (negative) loglikelihood at its minimum.\n", " | fcalls : int\n", " | Number of calls to loglike.\n", " | gcalls : int\n", " | Number of calls to gradient/score.\n", " | warnflag : int\n", " | 1: Maximum number of iterations exceeded. 2: Gradient and/\n", " | or function calls not changing.\n", " | converged : bool\n", " | True: converged. False: did not converge.\n", " | allvecs : list\n", " | Results at each iteration.\n", " | 'ncg'\n", " | fopt : float\n", " | Value of the (negative) loglikelihood at its minimum.\n", " | fcalls : int\n", " | Number of calls to loglike.\n", " | gcalls : int\n", " | Number of calls to gradient/score.\n", " | hcalls : int\n", " | Number of calls to hessian.\n", " | warnflag : int\n", " | 1: Maximum number of iterations exceeded.\n", " | converged : bool\n", " | True: converged. False: did not converge.\n", " | allvecs : list\n", " | Results at each iteration.\n", " | \n", " | Method resolution order:\n", " | LikelihoodModelResults\n", " | Results\n", " | __builtin__.object\n", " | \n", " | Methods defined here:\n", " | \n", " | __init__(self, model, params, normalized_cov_params=None, scale=1.0)\n", " | \n", " | conf_int(self, alpha=0.05, cols=None, method='default')\n", " | Returns the confidence interval of the fitted parameters.\n", " | \n", " | Parameters\n", " | ----------\n", " | alpha : float, optional\n", " | The `alpha` level for the confidence interval.\n", " | ie., The default `alpha` = .05 returns a 95% confidence interval.\n", " | cols : array-like, optional\n", " | `cols` specifies which confidence intervals to return\n", " | method : string\n", " | Not Implemented Yet\n", " | Method to estimate the confidence_interval.\n", " | \"Default\" : uses self.bse which is based on inverse Hessian for MLE\n", " | \"jhj\" :\n", " | \"jac\" :\n", " | \"boot-bse\"\n", " | \"boot_quant\"\n", " | \"profile\"\n", " | \n", " | \n", " | Returns\n", " | --------\n", " | conf_int : array\n", " | Each row contains [lower, upper] confidence interval\n", " | \n", " | Examples\n", " | --------\n", " | >>> import statsmodels.api as sm\n", " | >>> data = sm.datasets.longley.load()\n", " | >>> data.exog = sm.add_constant(data.exog)\n", " | >>> results = sm.OLS(data.endog, data.exog).fit()\n", " | >>> results.conf_int()\n", " | array([[ -1.77029035e+02, 2.07152780e+02],\n", " | [ -1.11581102e-01, 3.99427438e-02],\n", " | [ -3.12506664e+00, -9.15392966e-01],\n", " | [ -1.51794870e+00, -5.48505034e-01],\n", " | [ -5.62517214e-01, 4.60309003e-01],\n", " | [ 7.98787515e+02, 2.85951541e+03],\n", " | [ -5.49652948e+06, -1.46798779e+06]])\n", " | \n", " | >>> results.conf_int(cols=(1,2))\n", " | array([[-0.1115811 , 0.03994274],\n", " | [-3.12506664, -0.91539297]])\n", " | \n", " | Notes\n", " | -----\n", " | The confidence interval is based on the standard normal distribution.\n", " | Models wish to use a different distribution should overwrite this\n", " | method.\n", " | \n", " | cov_params(self, r_matrix=None, column=None, scale=None, cov_p=None, other=None)\n", " | Returns the variance/covariance matrix.\n", " | \n", " | The variance/covariance matrix can be of a linear contrast\n", " | of the estimates of params or all params multiplied by scale which\n", " | will usually be an estimate of sigma^2. Scale is assumed to be\n", " | a scalar.\n", " | \n", " | Parameters\n", " | ----------\n", " | r_matrix : array-like\n", " | Can be 1d, or 2d. Can be used alone or with other.\n", " | column : array-like, optional\n", " | Must be used on its own. Can be 0d or 1d see below.\n", " | scale : float, optional\n", " | Can be specified or not. Default is None, which means that\n", " | the scale argument is taken from the model.\n", " | other : array-like, optional\n", " | Can be used when r_matrix is specified.\n", " | \n", " | Returns\n", " | -------\n", " | (The below are assumed to be in matrix notation.)\n", " | \n", " | cov : ndarray\n", " | \n", " | If no argument is specified returns the covariance matrix of a model\n", " | (scale)*(X.T X)^(-1)\n", " | \n", " | If contrast is specified it pre and post-multiplies as follows\n", " | (scale) * r_matrix (X.T X)^(-1) r_matrix.T\n", " | \n", " | If contrast and other are specified returns\n", " | (scale) * r_matrix (X.T X)^(-1) other.T\n", " | \n", " | If column is specified returns\n", " | (scale) * (X.T X)^(-1)[column,column] if column is 0d\n", " | \n", " | OR\n", " | \n", " | (scale) * (X.T X)^(-1)[column][:,column] if column is 1d\n", " | \n", " | f_test(self, r_matrix, q_matrix=None, cov_p=None, scale=1.0, invcov=None)\n", " | Compute an F-test for a joint linear hypothesis.\n", " | \n", " | Parameters\n", " | ----------\n", " | r_matrix : array-like, str, or tuple\n", " | - array : An r x k array where r is the number of restrictions to\n", " | test and k is the number of regressors.\n", " | - str : The full hypotheses to test can be given as a string.\n", " | See the examples.\n", " | - tuple : A tuple of arrays in the form (R, q), since q_matrix is\n", " | deprecated.\n", " | q_matrix : array-like\n", " | This is deprecated. See `r_matrix` and the examples for more\n", " | information on new usage. Can be either a scalar or a length p\n", " | row vector. If omitted and r_matrix is an array, `q_matrix` is\n", " | assumed to be a conformable array of zeros.\n", " | cov_p : array-like, optional\n", " | An alternative estimate for the parameter covariance matrix.\n", " | If None is given, self.normalized_cov_params is used.\n", " | scale : float, optional\n", " | Default is 1.0 for no scaling.\n", " | invcov : array-like, optional\n", " | A q x q array to specify an inverse covariance matrix based on a\n", " | restrictions matrix.\n", " | \n", " | Examples\n", " | --------\n", " | >>> import numpy as np\n", " | >>> import statsmodels.api as sm\n", " | >>> data = sm.datasets.longley.load()\n", " | >>> data.exog = sm.add_constant(data.exog)\n", " | >>> results = sm.OLS(data.endog, data.exog).fit()\n", " | >>> A = np.identity(len(results.params))\n", " | >>> A = A[:-1,:]\n", " | \n", " | This tests that each coefficient is jointly statistically\n", " | significantly different from zero.\n", " | \n", " | >>> print results.f_test(A)\n", " | \n", " | \n", " | Compare this to\n", " | \n", " | >>> results.F\n", " | 330.2853392346658\n", " | >>> results.F_p\n", " | 4.98403096572e-10\n", " | \n", " | >>> B = np.array(([0,1,-1,0,0,0,0],[0,0,0,0,1,-1,0]))\n", " | \n", " | This tests that the coefficient on the 2nd and 3rd regressors are\n", " | equal and jointly that the coefficient on the 5th and 6th regressors\n", " | are equal.\n", " | \n", " | >>> print results.f_test(B)\n", " | \n", " | \n", " | Alternatively, you can specify the hypothesis tests using a string\n", " | \n", " | >>> from statsmodels.datasets import longley\n", " | >>> from statsmodels.formula.api import ols\n", " | >>> dta = longley.load_pandas().data\n", " | >>> formula = 'TOTEMP ~ GNPDEFL + GNP + UNEMP + ARMED + POP + YEAR'\n", " | >>> results = ols(formula, dta).fit()\n", " | >>> hypotheses = '(GNPDEFL = GNP), (UNEMP = 2), (YEAR/1829 = 1)'\n", " | >>> f_test = results.new_f_test(hypotheses)\n", " | >>> print f_test\n", " | \n", " | See also\n", " | --------\n", " | statsmodels.contrasts\n", " | statsmodels.model.t_test\n", " | patsy.DesignInfo.linear_constraint\n", " | \n", " | Notes\n", " | -----\n", " | The matrix `r_matrix` is assumed to be non-singular. More precisely,\n", " | \n", " | r_matrix (pX pX.T) r_matrix.T\n", " | \n", " | is assumed invertible. Here, pX is the generalized inverse of the\n", " | design matrix of the model. There can be problems in non-OLS models\n", " | where the rank of the covariance of the noise is not full.\n", " | \n", " | normalized_cov_params(self)\n", " | \n", " | remove_data(self)\n", " | remove data arrays, all nobs arrays from result and model\n", " | \n", " | This reduces the size of the instance, so it can be pickled with less\n", " | memory. Currently tested for use with predict from an unpickled\n", " | results and model instance.\n", " | \n", " | .. warning:: Since data and some intermediate results have been removed\n", " | calculating new statistics that require them will raise exceptions.\n", " | The exception will occur the first time an attribute is accessed that\n", " | has been set to None.\n", " | \n", " | Not fully tested for time series models, tsa, and might delete too much\n", " | for prediction or not all that would be possible.\n", " | \n", " | The list of arrays to delete is maintained as an attribute of the\n", " | result and model instance, except for cached values. These lists could\n", " | be changed before calling remove_data.\n", " | \n", " | save(self, fname, remove_data=False)\n", " | save a pickle of this instance\n", " | \n", " | Parameters\n", " | ----------\n", " | fname : string or filehandle\n", " | fname can be a string to a file path or filename, or a filehandle.\n", " | remove_data : bool\n", " | If False (default), then the instance is pickled without changes.\n", " | If True, then all arrays with length nobs are set to None before\n", " | pickling. See the remove_data method.\n", " | In some cases not all arrays will be set to None.\n", " | \n", " | Notes\n", " | -----\n", " | If remove_data is true and the model result does not implement a\n", " | remove_data method then this will raise an exception.\n", " | \n", " | t(self, column=None)\n", " | deprecated: Return the t-statistic for a given parameter estimate.\n", " | \n", " | FutureWarning: use attribute tvalues instead, t will be removed\n", " | in the next release\n", " | \n", " | Parameters\n", " | ----------\n", " | column : array-like\n", " | The columns for which you would like the t-value.\n", " | Note that this uses Python's indexing conventions.\n", " | \n", " | See also\n", " | ---------\n", " | Use t_test for more complicated t-statistics.\n", " | \n", " | Examples\n", " | --------\n", " | >>> import statsmodels.api as sm\n", " | >>> data = sm.datasets.longley.load()\n", " | >>> data.exog = sm.add_constant(data.exog)\n", " | >>> results = sm.OLS(data.endog, data.exog).fit()\n", " | >>> results.tvalues\n", " | array([ 0.17737603, -1.06951632, -4.13642736, -4.82198531, -0.22605114,\n", " | 4.01588981, -3.91080292])\n", " | >>> results.tvalues[[1,2,4]]\n", " | array([-1.06951632, -4.13642736, -0.22605114])\n", " | >>> import numpy as np\n", " | >>> results.tvalues[np.array([1,2,4]]\n", " | array([-1.06951632, -4.13642736, -0.22605114])\n", " | \n", " | t_test(self, r_matrix, q_matrix=None, cov_p=None, scale=None)\n", " | Compute a t-test for a joint linear hypothesis of the form Rb = q\n", " | \n", " | Parameters\n", " | ----------\n", " | r_matrix : array-like, str, tuple\n", " | - array : If an array is given, a p x k 2d array or length k 1d\n", " | array specifying the linear restrictions.\n", " | - str : The full hypotheses to test can be given as a string.\n", " | See the examples.\n", " | - tuple : A tuple of arrays in the form (R, q), since q_matrix is\n", " | deprecated.\n", " | q_matrix : array-like or scalar, optional\n", " | This is deprecated. See `r_matrix` and the examples for more\n", " | information on new usage. Can be either a scalar or a length p\n", " | row vector. If omitted and r_matrix is an array, `q_matrix` is\n", " | assumed to be a conformable array of zeros.\n", " | cov_p : array-like, optional\n", " | An alternative estimate for the parameter covariance matrix.\n", " | If None is given, self.normalized_cov_params is used.\n", " | scale : float, optional\n", " | An optional `scale` to use. Default is the scale specified\n", " | by the model fit.\n", " | \n", " | Examples\n", " | --------\n", " | >>> import numpy as np\n", " | >>> import statsmodels.api as sm\n", " | >>> data = sm.datasets.longley.load()\n", " | >>> data.exog = sm.add_constant(data.exog)\n", " | >>> results = sm.OLS(data.endog, data.exog).fit()\n", " | >>> r = np.zeros_like(results.params)\n", " | >>> r[4:6] = [1,-1]\n", " | >>> print r\n", " | [ 0. 0. 0. 0. 1. -1. 0.]\n", " | \n", " | r tests that the coefficients on the 5th and 6th independent\n", " | variable are the same.\n", " | \n", " | >>>T_Test = results.t_test(r)\n", " | >>>print T_test\n", " | \n", " | >>> T_test.effect\n", " | -1829.2025687192481\n", " | >>> T_test.sd\n", " | 455.39079425193762\n", " | >>> T_test.t\n", " | -4.0167754636411717\n", " | >>> T_test.p\n", " | 0.0015163772380899498\n", " | \n", " | Alternatively, you can specify the hypothesis tests using a string\n", " | \n", " | >>> dta = sm.datasets.longley.load_pandas().data\n", " | >>> formula = 'TOTEMP ~ GNPDEFL + GNP + UNEMP + ARMED + POP + YEAR'\n", " | >>> results = ols(formula, dta).fit()\n", " | >>> hypotheses = 'GNPDEFL = GNP, UNEMP = 2, YEAR/1829 = 1'\n", " | >>> t_test = results.new_t_test(hypotheses)\n", " | >>> print t_test\n", " | \n", " | See also\n", " | ---------\n", " | tvalues : individual t statistics\n", " | f_test : for F tests\n", " | patsy.DesignInfo.linear_constraint\n", " | \n", " | ----------------------------------------------------------------------\n", " | Class methods defined here:\n", " | \n", " | load(cls, fname) from __builtin__.type\n", " | load a pickle, (class method)\n", " | \n", " | Parameters\n", " | ----------\n", " | fname : string or filehandle\n", " | fname can be a string to a file path or filename, or a filehandle.\n", " | \n", " | Returns\n", " | -------\n", " | unpickled instance\n", " | \n", " | ----------------------------------------------------------------------\n", " | Data descriptors defined here:\n", " | \n", " | bse\n", " | \n", " | llf\n", " | \n", " | pvalues\n", " | \n", " | tvalues\n", " | \n", " | ----------------------------------------------------------------------\n", " | Methods inherited from Results:\n", " | \n", " | initialize(self, model, params, **kwd)\n", " | \n", " | predict(self, exog=None, transform=True, *args, **kwargs)\n", " | Call self.model.predict with self.params as the first argument.\n", " | \n", " | Parameters\n", " | ----------\n", " | exog : array-like, optional\n", " | The values for which you want to predict.\n", " | transform : bool, optional\n", " | If the model was fit via a formula, do you want to pass\n", " | exog through the formula. Default is True. E.g., if you fit\n", " | a model y ~ log(x1) + log(x2), and transform is True, then\n", " | you can pass a data structure that contains x1 and x2 in\n", " | their original form. Otherwise, you'd need to log the data\n", " | first.\n", " | \n", " | Returns\n", " | -------\n", " | See self.model.predict\n", " | \n", " | ----------------------------------------------------------------------\n", " | Data descriptors inherited from Results:\n", " | \n", " | __dict__\n", " | dictionary for instance variables (if defined)\n", " | \n", " | __weakref__\n", " | list of weak references to the object (if defined)\n", "\n" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "from statsmodels.regression.linear_model import RegressionResults" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "help(RegressionResults)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Help on class RegressionResults in module statsmodels.regression.linear_model:\n", "\n", "class RegressionResults(statsmodels.base.model.LikelihoodModelResults)\n", " | This class summarizes the fit of a linear regression model.\n", " | \n", " | It handles the output of contrasts, estimates of covariance, etc.\n", " | \n", " | Returns\n", " | -------\n", " | **Attributes**\n", " | \n", " | aic\n", " | Aikake's information criteria :math:`-2llf + 2(df_model+1)`\n", " | bic\n", " | Bayes' information criteria :math:`-2llf + \\log(n)(df_model+1)`\n", " | bse\n", " | The standard errors of the parameter estimates.\n", " | pinv_wexog\n", " | See specific model class docstring\n", " | centered_tss\n", " | The total sum of squares centered about the mean\n", " | cov_HC0\n", " | See HC0_se below. Only available after calling HC0_se.\n", " | cov_HC1\n", " | See HC1_se below. Only available after calling HC1_se.\n", " | cov_HC2\n", " | See HC2_se below. Only available after calling HC2_se.\n", " | cov_HC3\n", " | See HC3_se below. Only available after calling HC3_se.\n", " | df_model :\n", " | Model degress of freedom. The number of regressors p - 1 for the\n", " | constant Note that df_model does not include the constant even though\n", " | the design does. The design is always assumed to have a constant\n", " | in calculating results for now.\n", " | df_resid\n", " | Residual degrees of freedom. n - p. Note that the constant *is*\n", " | included in calculating the residual degrees of freedom.\n", " | ess\n", " | Explained sum of squares. The centered total sum of squares minus\n", " | the sum of squared residuals.\n", " | fvalue\n", " | F-statistic of the fully specified model. Calculated as the mean\n", " | squared error of the model divided by the mean squared error of the\n", " | residuals.\n", " | f_pvalue\n", " | p-value of the F-statistic\n", " | fittedvalues\n", " | The predicted the values for the original (unwhitened) design.\n", " | het_scale\n", " | Only available if HC#_se is called. See HC#_se for more information.\n", " | HC0_se\n", " | White's (1980) heteroskedasticity robust standard errors.\n", " | Defined as sqrt(diag(X.T X)^(-1)X.T diag(e_i^(2)) X(X.T X)^(-1)\n", " | where e_i = resid[i]\n", " | HC0_se is a property. It is not evaluated until it is called.\n", " | When it is called the RegressionResults instance will then have\n", " | another attribute cov_HC0, which is the full heteroskedasticity\n", " | consistent covariance matrix and also `het_scale`, which is in\n", " | this case just resid**2. HCCM matrices are only appropriate for OLS.\n", " | HC1_se\n", " | MacKinnon and White's (1985) alternative heteroskedasticity robust\n", " | standard errors.\n", " | Defined as sqrt(diag(n/(n-p)*HC_0)\n", " | HC1_se is a property. It is not evaluated until it is called.\n", " | When it is called the RegressionResults instance will then have\n", " | another attribute cov_HC1, which is the full HCCM and also `het_scale`,\n", " | which is in this case n/(n-p)*resid**2. HCCM matrices are only\n", " | appropriate for OLS.\n", " | HC2_se\n", " | MacKinnon and White's (1985) alternative heteroskedasticity robust\n", " | standard errors.\n", " | Defined as (X.T X)^(-1)X.T diag(e_i^(2)/(1-h_ii)) X(X.T X)^(-1)\n", " | where h_ii = x_i(X.T X)^(-1)x_i.T\n", " | HC2_se is a property. It is not evaluated until it is called.\n", " | When it is called the RegressionResults instance will then have\n", " | another attribute cov_HC2, which is the full HCCM and also `het_scale`,\n", " | which is in this case is resid^(2)/(1-h_ii). HCCM matrices are only\n", " | appropriate for OLS.\n", " | HC3_se\n", " | MacKinnon and White's (1985) alternative heteroskedasticity robust\n", " | standard errors.\n", " | Defined as (X.T X)^(-1)X.T diag(e_i^(2)/(1-h_ii)^(2)) X(X.T X)^(-1)\n", " | where h_ii = x_i(X.T X)^(-1)x_i.T\n", " | HC3_se is a property. It is not evaluated until it is called.\n", " | When it is called the RegressionResults instance will then have\n", " | another attribute cov_HC3, which is the full HCCM and also `het_scale`,\n", " | which is in this case is resid^(2)/(1-h_ii)^(2). HCCM matrices are\n", " | only appropriate for OLS.\n", " | model\n", " | A pointer to the model instance that called fit() or results.\n", " | mse_model\n", " | Mean squared error the model. This is the explained sum of squares\n", " | divided by the model degrees of freedom.\n", " | mse_resid\n", " | Mean squared error of the residuals. The sum of squared residuals\n", " | divided by the residual degrees of freedom.\n", " | mse_total\n", " | Total mean squared error. Defined as the uncentered total sum of\n", " | squares divided by n the number of observations.\n", " | nobs\n", " | Number of observations n.\n", " | normalized_cov_params\n", " | See specific model class docstring\n", " | params\n", " | The linear coefficients that minimize the least squares criterion. This\n", " | is usually called Beta for the classical linear model.\n", " | pvalues\n", " | The two-tailed p values for the t-stats of the params.\n", " | resid\n", " | The residuals of the model.\n", " | rsquared\n", " | R-squared of a model with an intercept. This is defined here as\n", " | 1 - `ssr`/`centered_tss`\n", " | rsquared_adj\n", " | Adjusted R-squared. This is defined here as\n", " | 1 - (n-1)/(n-p)*(1-`rsquared`)\n", " | scale\n", " | A scale factor for the covariance matrix.\n", " | Default value is ssr/(n-p). Note that the square root of `scale` is\n", " | often called the standard error of the regression.\n", " | ssr\n", " | Sum of squared (whitened) residuals.\n", " | uncentered_tss\n", " | Uncentered sum of squares. Sum of the squared values of the\n", " | (whitened) endogenous response variable.\n", " | wresid\n", " | The residuals of the transformed/whitened regressand and regressor(s)\n", " | \n", " | Method resolution order:\n", " | RegressionResults\n", " | statsmodels.base.model.LikelihoodModelResults\n", " | statsmodels.base.model.Results\n", " | __builtin__.object\n", " | \n", " | Methods defined here:\n", " | \n", " | __init__(self, model, params, normalized_cov_params=None, scale=1.0)\n", " | \n", " | __str__(self)\n", " | \n", " | compare_f_test(self, restricted)\n", " | use F test to test whether restricted model is correct\n", " | \n", " | Parameters\n", " | ----------\n", " | restricted : Result instance\n", " | The restricted model is assumed to be nested in the current\n", " | model. The result instance of the restricted model is required to\n", " | have two attributes, residual sum of squares, `ssr`, residual\n", " | degrees of freedom, `df_resid`.\n", " | \n", " | Returns\n", " | -------\n", " | f_value : float\n", " | test statistic, F distributed\n", " | p_value : float\n", " | p-value of the test statistic\n", " | df_diff : int\n", " | degrees of freedom of the restriction, i.e. difference in df between\n", " | models\n", " | \n", " | Notes\n", " | -----\n", " | See mailing list discussion October 17,\n", " | \n", " | compare_lr_test(self, restricted)\n", " | Likelihood ratio test to test whether restricted model is correct\n", " | \n", " | Parameters\n", " | ----------\n", " | restricted : Result instance\n", " | The restricted model is assumed to be nested in the current model.\n", " | The result instance of the restricted model is required to have two\n", " | attributes, residual sum of squares, `ssr`, residual degrees of\n", " | freedom, `df_resid`.\n", " | \n", " | Returns\n", " | -------\n", " | lr_stat : float\n", " | likelihood ratio, chisquare distributed with df_diff degrees of\n", " | freedom\n", " | p_value : float\n", " | p-value of the test statistic\n", " | df_diff : int\n", " | degrees of freedom of the restriction, i.e. difference in df between\n", " | models\n", " | \n", " | Notes\n", " | -----\n", " | \n", " | .. math:: D=-2\\log\\left(\\frac{\\mathcal{L}_{null}}\n", " | {\\mathcal{L}_{alternative}}\\right)\n", " | \n", " | where :math:`\\mathcal{L}` is the likelihood of the model. With :math:`D`\n", " | distributed as chisquare with df equal to difference in number of\n", " | parameters or equivalently difference in residual degrees of freedom\n", " | \n", " | TODO: put into separate function, needs tests\n", " | \n", " | conf_int(self, alpha=0.05, cols=None)\n", " | Returns the confidence interval of the fitted parameters.\n", " | \n", " | Parameters\n", " | ----------\n", " | alpha : float, optional\n", " | The `alpha` level for the confidence interval.\n", " | ie., The default `alpha` = .05 returns a 95% confidence interval.\n", " | cols : array-like, optional\n", " | `cols` specifies which confidence intervals to return\n", " | \n", " | Notes\n", " | -----\n", " | The confidence interval is based on Student's t-distribution.\n", " | \n", " | norm_resid(self)\n", " | Residuals, normalized to have unit length and unit variance.\n", " | \n", " | Returns\n", " | -------\n", " | An array wresid/sqrt(scale)\n", " | \n", " | Notes\n", " | -----\n", " | This method is untested\n", " | \n", " | summary(self, yname=None, xname=None, title=None, alpha=0.05)\n", " | Summarize the Regression Results\n", " | \n", " | Parameters\n", " | -----------\n", " | yname : string, optional\n", " | Default is `y`\n", " | xname : list of strings, optional\n", " | Default is `var_##` for ## in p the number of regressors\n", " | title : string, optional\n", " | Title for the top table. If not None, then this replaces the\n", " | default title\n", " | alpha : float\n", " | significance level for the confidence intervals\n", " | \n", " | Returns\n", " | -------\n", " | smry : Summary instance\n", " | this holds the summary tables and text, which can be printed or\n", " | converted to various output formats.\n", " | \n", " | See Also\n", " | --------\n", " | statsmodels.iolib.summary.Summary : class to hold summary\n", " | results\n", " | \n", " | summary_old(self, yname=None, xname=None, returns='text')\n", " | returns a string that summarizes the regression results\n", " | \n", " | Parameters\n", " | -----------\n", " | yname : string, optional\n", " | Default is `Y`\n", " | xname : list of strings, optional\n", " | Default is `X.#` for # in p the number of regressors\n", " | \n", " | Returns\n", " | -------\n", " | String summarizing the fit of a linear model.\n", " | \n", " | Examples\n", " | --------\n", " | >>> import statsmodels.api as sm\n", " | >>> data = sm.datasets.longley.load()\n", " | >>> data.exog = sm.add_constant(data.exog)\n", " | >>> ols_results = sm.OLS(data.endog, data.exog).fit()\n", " | >>> print ols_results.summary()\n", " | ...\n", " | \n", " | Notes\n", " | -----\n", " | All residual statistics are calculated on whitened residuals.\n", " | \n", " | ----------------------------------------------------------------------\n", " | Data descriptors defined here:\n", " | \n", " | HC0_se\n", " | See statsmodels.RegressionResults\n", " | \n", " | HC1_se\n", " | See statsmodels.RegressionResults\n", " | \n", " | HC2_se\n", " | See statsmodels.RegressionResults\n", " | \n", " | HC3_se\n", " | See statsmodels.RegressionResults\n", " | \n", " | aic\n", " | \n", " | bic\n", " | \n", " | bse\n", " | \n", " | centered_tss\n", " | \n", " | df_model\n", " | \n", " | df_resid\n", " | \n", " | ess\n", " | \n", " | f_pvalue\n", " | \n", " | fittedvalues\n", " | \n", " | fvalue\n", " | \n", " | mse_model\n", " | \n", " | mse_resid\n", " | \n", " | mse_total\n", " | \n", " | nobs\n", " | \n", " | pvalues\n", " | \n", " | resid\n", " | \n", " | rsquared\n", " | \n", " | rsquared_adj\n", " | \n", " | scale\n", " | \n", " | ssr\n", " | \n", " | uncentered_tss\n", " | \n", " | wresid\n", " | \n", " | ----------------------------------------------------------------------\n", " | Methods inherited from statsmodels.base.model.LikelihoodModelResults:\n", " | \n", " | cov_params(self, r_matrix=None, column=None, scale=None, cov_p=None, other=None)\n", " | Returns the variance/covariance matrix.\n", " | \n", " | The variance/covariance matrix can be of a linear contrast\n", " | of the estimates of params or all params multiplied by scale which\n", " | will usually be an estimate of sigma^2. Scale is assumed to be\n", " | a scalar.\n", " | \n", " | Parameters\n", " | ----------\n", " | r_matrix : array-like\n", " | Can be 1d, or 2d. Can be used alone or with other.\n", " | column : array-like, optional\n", " | Must be used on its own. Can be 0d or 1d see below.\n", " | scale : float, optional\n", " | Can be specified or not. Default is None, which means that\n", " | the scale argument is taken from the model.\n", " | other : array-like, optional\n", " | Can be used when r_matrix is specified.\n", " | \n", " | Returns\n", " | -------\n", " | (The below are assumed to be in matrix notation.)\n", " | \n", " | cov : ndarray\n", " | \n", " | If no argument is specified returns the covariance matrix of a model\n", " | (scale)*(X.T X)^(-1)\n", " | \n", " | If contrast is specified it pre and post-multiplies as follows\n", " | (scale) * r_matrix (X.T X)^(-1) r_matrix.T\n", " | \n", " | If contrast and other are specified returns\n", " | (scale) * r_matrix (X.T X)^(-1) other.T\n", " | \n", " | If column is specified returns\n", " | (scale) * (X.T X)^(-1)[column,column] if column is 0d\n", " | \n", " | OR\n", " | \n", " | (scale) * (X.T X)^(-1)[column][:,column] if column is 1d\n", " | \n", " | f_test(self, r_matrix, q_matrix=None, cov_p=None, scale=1.0, invcov=None)\n", " | Compute an F-test for a joint linear hypothesis.\n", " | \n", " | Parameters\n", " | ----------\n", " | r_matrix : array-like, str, or tuple\n", " | - array : An r x k array where r is the number of restrictions to\n", " | test and k is the number of regressors.\n", " | - str : The full hypotheses to test can be given as a string.\n", " | See the examples.\n", " | - tuple : A tuple of arrays in the form (R, q), since q_matrix is\n", " | deprecated.\n", " | q_matrix : array-like\n", " | This is deprecated. See `r_matrix` and the examples for more\n", " | information on new usage. Can be either a scalar or a length p\n", " | row vector. If omitted and r_matrix is an array, `q_matrix` is\n", " | assumed to be a conformable array of zeros.\n", " | cov_p : array-like, optional\n", " | An alternative estimate for the parameter covariance matrix.\n", " | If None is given, self.normalized_cov_params is used.\n", " | scale : float, optional\n", " | Default is 1.0 for no scaling.\n", " | invcov : array-like, optional\n", " | A q x q array to specify an inverse covariance matrix based on a\n", " | restrictions matrix.\n", " | \n", " | Examples\n", " | --------\n", " | >>> import numpy as np\n", " | >>> import statsmodels.api as sm\n", " | >>> data = sm.datasets.longley.load()\n", " | >>> data.exog = sm.add_constant(data.exog)\n", " | >>> results = sm.OLS(data.endog, data.exog).fit()\n", " | >>> A = np.identity(len(results.params))\n", " | >>> A = A[:-1,:]\n", " | \n", " | This tests that each coefficient is jointly statistically\n", " | significantly different from zero.\n", " | \n", " | >>> print results.f_test(A)\n", " | \n", " | \n", " | Compare this to\n", " | \n", " | >>> results.F\n", " | 330.2853392346658\n", " | >>> results.F_p\n", " | 4.98403096572e-10\n", " | \n", " | >>> B = np.array(([0,1,-1,0,0,0,0],[0,0,0,0,1,-1,0]))\n", " | \n", " | This tests that the coefficient on the 2nd and 3rd regressors are\n", " | equal and jointly that the coefficient on the 5th and 6th regressors\n", " | are equal.\n", " | \n", " | >>> print results.f_test(B)\n", " | \n", " | \n", " | Alternatively, you can specify the hypothesis tests using a string\n", " | \n", " | >>> from statsmodels.datasets import longley\n", " | >>> from statsmodels.formula.api import ols\n", " | >>> dta = longley.load_pandas().data\n", " | >>> formula = 'TOTEMP ~ GNPDEFL + GNP + UNEMP + ARMED + POP + YEAR'\n", " | >>> results = ols(formula, dta).fit()\n", " | >>> hypotheses = '(GNPDEFL = GNP), (UNEMP = 2), (YEAR/1829 = 1)'\n", " | >>> f_test = results.new_f_test(hypotheses)\n", " | >>> print f_test\n", " | \n", " | See also\n", " | --------\n", " | statsmodels.contrasts\n", " | statsmodels.model.t_test\n", " | patsy.DesignInfo.linear_constraint\n", " | \n", " | Notes\n", " | -----\n", " | The matrix `r_matrix` is assumed to be non-singular. More precisely,\n", " | \n", " | r_matrix (pX pX.T) r_matrix.T\n", " | \n", " | is assumed invertible. Here, pX is the generalized inverse of the\n", " | design matrix of the model. There can be problems in non-OLS models\n", " | where the rank of the covariance of the noise is not full.\n", " | \n", " | normalized_cov_params(self)\n", " | \n", " | remove_data(self)\n", " | remove data arrays, all nobs arrays from result and model\n", " | \n", " | This reduces the size of the instance, so it can be pickled with less\n", " | memory. Currently tested for use with predict from an unpickled\n", " | results and model instance.\n", " | \n", " | .. warning:: Since data and some intermediate results have been removed\n", " | calculating new statistics that require them will raise exceptions.\n", " | The exception will occur the first time an attribute is accessed that\n", " | has been set to None.\n", " | \n", " | Not fully tested for time series models, tsa, and might delete too much\n", " | for prediction or not all that would be possible.\n", " | \n", " | The list of arrays to delete is maintained as an attribute of the\n", " | result and model instance, except for cached values. These lists could\n", " | be changed before calling remove_data.\n", " | \n", " | save(self, fname, remove_data=False)\n", " | save a pickle of this instance\n", " | \n", " | Parameters\n", " | ----------\n", " | fname : string or filehandle\n", " | fname can be a string to a file path or filename, or a filehandle.\n", " | remove_data : bool\n", " | If False (default), then the instance is pickled without changes.\n", " | If True, then all arrays with length nobs are set to None before\n", " | pickling. See the remove_data method.\n", " | In some cases not all arrays will be set to None.\n", " | \n", " | Notes\n", " | -----\n", " | If remove_data is true and the model result does not implement a\n", " | remove_data method then this will raise an exception.\n", " | \n", " | t(self, column=None)\n", " | deprecated: Return the t-statistic for a given parameter estimate.\n", " | \n", " | FutureWarning: use attribute tvalues instead, t will be removed\n", " | in the next release\n", " | \n", " | Parameters\n", " | ----------\n", " | column : array-like\n", " | The columns for which you would like the t-value.\n", " | Note that this uses Python's indexing conventions.\n", " | \n", " | See also\n", " | ---------\n", " | Use t_test for more complicated t-statistics.\n", " | \n", " | Examples\n", " | --------\n", " | >>> import statsmodels.api as sm\n", " | >>> data = sm.datasets.longley.load()\n", " | >>> data.exog = sm.add_constant(data.exog)\n", " | >>> results = sm.OLS(data.endog, data.exog).fit()\n", " | >>> results.tvalues\n", " | array([ 0.17737603, -1.06951632, -4.13642736, -4.82198531, -0.22605114,\n", " | 4.01588981, -3.91080292])\n", " | >>> results.tvalues[[1,2,4]]\n", " | array([-1.06951632, -4.13642736, -0.22605114])\n", " | >>> import numpy as np\n", " | >>> results.tvalues[np.array([1,2,4]]\n", " | array([-1.06951632, -4.13642736, -0.22605114])\n", " | \n", " | t_test(self, r_matrix, q_matrix=None, cov_p=None, scale=None)\n", " | Compute a t-test for a joint linear hypothesis of the form Rb = q\n", " | \n", " | Parameters\n", " | ----------\n", " | r_matrix : array-like, str, tuple\n", " | - array : If an array is given, a p x k 2d array or length k 1d\n", " | array specifying the linear restrictions.\n", " | - str : The full hypotheses to test can be given as a string.\n", " | See the examples.\n", " | - tuple : A tuple of arrays in the form (R, q), since q_matrix is\n", " | deprecated.\n", " | q_matrix : array-like or scalar, optional\n", " | This is deprecated. See `r_matrix` and the examples for more\n", " | information on new usage. Can be either a scalar or a length p\n", " | row vector. If omitted and r_matrix is an array, `q_matrix` is\n", " | assumed to be a conformable array of zeros.\n", " | cov_p : array-like, optional\n", " | An alternative estimate for the parameter covariance matrix.\n", " | If None is given, self.normalized_cov_params is used.\n", " | scale : float, optional\n", " | An optional `scale` to use. Default is the scale specified\n", " | by the model fit.\n", " | \n", " | Examples\n", " | --------\n", " | >>> import numpy as np\n", " | >>> import statsmodels.api as sm\n", " | >>> data = sm.datasets.longley.load()\n", " | >>> data.exog = sm.add_constant(data.exog)\n", " | >>> results = sm.OLS(data.endog, data.exog).fit()\n", " | >>> r = np.zeros_like(results.params)\n", " | >>> r[4:6] = [1,-1]\n", " | >>> print r\n", " | [ 0. 0. 0. 0. 1. -1. 0.]\n", " | \n", " | r tests that the coefficients on the 5th and 6th independent\n", " | variable are the same.\n", " | \n", " | >>>T_Test = results.t_test(r)\n", " | >>>print T_test\n", " | \n", " | >>> T_test.effect\n", " | -1829.2025687192481\n", " | >>> T_test.sd\n", " | 455.39079425193762\n", " | >>> T_test.t\n", " | -4.0167754636411717\n", " | >>> T_test.p\n", " | 0.0015163772380899498\n", " | \n", " | Alternatively, you can specify the hypothesis tests using a string\n", " | \n", " | >>> dta = sm.datasets.longley.load_pandas().data\n", " | >>> formula = 'TOTEMP ~ GNPDEFL + GNP + UNEMP + ARMED + POP + YEAR'\n", " | >>> results = ols(formula, dta).fit()\n", " | >>> hypotheses = 'GNPDEFL = GNP, UNEMP = 2, YEAR/1829 = 1'\n", " | >>> t_test = results.new_t_test(hypotheses)\n", " | >>> print t_test\n", " | \n", " | See also\n", " | ---------\n", " | tvalues : individual t statistics\n", " | f_test : for F tests\n", " | patsy.DesignInfo.linear_constraint\n", " | \n", " | ----------------------------------------------------------------------\n", " | Class methods inherited from statsmodels.base.model.LikelihoodModelResults:\n", " | \n", " | load(cls, fname) from __builtin__.type\n", " | load a pickle, (class method)\n", " | \n", " | Parameters\n", " | ----------\n", " | fname : string or filehandle\n", " | fname can be a string to a file path or filename, or a filehandle.\n", " | \n", " | Returns\n", " | -------\n", " | unpickled instance\n", " | \n", " | ----------------------------------------------------------------------\n", " | Data descriptors inherited from statsmodels.base.model.LikelihoodModelResults:\n", " | \n", " | llf\n", " | \n", " | tvalues\n", " | \n", " | ----------------------------------------------------------------------\n", " | Methods inherited from statsmodels.base.model.Results:\n", " | \n", " | initialize(self, model, params, **kwd)\n", " | \n", " | predict(self, exog=None, transform=True, *args, **kwargs)\n", " | Call self.model.predict with self.params as the first argument.\n", " | \n", " | Parameters\n", " | ----------\n", " | exog : array-like, optional\n", " | The values for which you want to predict.\n", " | transform : bool, optional\n", " | If the model was fit via a formula, do you want to pass\n", " | exog through the formula. Default is True. E.g., if you fit\n", " | a model y ~ log(x1) + log(x2), and transform is True, then\n", " | you can pass a data structure that contains x1 and x2 in\n", " | their original form. Otherwise, you'd need to log the data\n", " | first.\n", " | \n", " | Returns\n", " | -------\n", " | See self.model.predict\n", " | \n", " | ----------------------------------------------------------------------\n", " | Data descriptors inherited from statsmodels.base.model.Results:\n", " | \n", " | __dict__\n", " | dictionary for instance variables (if defined)\n", " | \n", " | __weakref__\n", " | list of weak references to the object (if defined)\n", "\n" ] } ], "prompt_number": 10 } ], "metadata": {} } ] }