{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "reasonable-corpus",
   "metadata": {
    "pycharm": {
     "is_executing": true,
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from gs_quant.session import Environment, GsSession\n",
    "\n",
    "# external users should substitute their client id and secret; please skip this step if using internal jupyterhub\n",
    "GsSession.use(Environment.PROD, client_id=None, client_secret=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "absent-nitrogen",
   "metadata": {},
   "source": [
    "# Factor Models\n",
    "\n",
    "The GS Quant `FactorRiskModel` class allows users to access vendor factor models such as Barra. The `FactorRiskModel` interface supports date-based querying of the factor risk model outputs such as factor returns, covariance matrix and specific risk for assets.\n",
    "\n",
    "In this tutorial, we’ll look at querying available risk models, their coverage universe, and how to access the returns and volatility of factors in the model. We also show how to query factor exposures (z-scores), specific risk and total risk for a given set of assets in the model's universe.\n",
    "\n",
    "The factor returns represent the regression outputs of the model for each day. The definitions of each factor vary depending on the model. More details can be found in the [Marquee Data Catalog](https://marquee.gs.com/s/discover/data-services/catalog?query=factor+risk+model)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "floral-apollo",
   "metadata": {},
   "source": [
    "### Risk Model Coverage\n",
    "\n",
    "Popular third party risk models that have been onboarded onto Marquee for programmatic access are below.\n",
    "\n",
    "| Risk Model Name                     | Risk Model Id     | Description|\n",
    "|-------------------------------------|-------------------|-------------\n",
    "| Barra USMED Short (BARRA_USMEDS)    | BARRA_USMEDS      |Barra (MSCI) US Total Market Equity Model for medium-term investors. Includes all styles from the long-term model plus additional factors for investment horizons between 1 month and 1 year (responsive variant).|\n",
    "| Barra USSLOW Long (BARRA_USSLOWL)   | BARRA_USSLOWL     |Barra (MSCI) US Total Market Equity Model for long-term investors. Designed with a focus on portfolio construction and reporting for long investment horizons (stable variant).|\n",
    "| Barra GEMLT Long (BARRA_GEMLTL)     | BARRA_GEMLTL      |Barra (MSCI) Global Total Market Equity Model for long-term investors. Designed with a focus on portfolio construction and reporting for global equity investors (stable variant).|\n",
    "| Barra US Fast (BARRA_USFAST)        | BARRA_USFAST      |Barra (MSCI) US Equity Trading Model for short-term investors. Includes all styles from the medium-term model plus additional factors for shorter investment horizons.|\n",
    "| Wolfe Developed Markets All-Cap v1  | WOLFE_QES_DM_AC_1 | Wolfe's Developed Markets All-Cap model is intended for global portfolios with an emphasis on Developed Markets. The model combines next generation factors like short interest and interest rate sensitivity with conventional factors like value and growth.|\n",
    "| Wolfe US TMT v2                     | WOLFE_QES_US_TMT_2| Wolfe's US TMT model is intended for sector portfolios. The model uses sector-specific factors in a TMT estimation universe to explain more systematic risk and return relative to a broad market model.|\n",
    "| Wolfe Europe All-Cap v2             | WOLFE_QES_EU_AC_2 | Wolfe's Europe All-Cap model is intended for European portfolios with a focus on the developed markets. The model combines next generation factors like short interest and interest rate sensitivity with conventional factors like value and growth.|\n",
    "| Wolfe US Healthcare v2              | WOLFE_QES_US_HC_2 | Wolfe's US Healthcare model is intended for sector portfolios. The model uses sector-specific factors in a Healthcare estimation universe to explain more systematic risk and return relative to a broad market model.|\n",
    "\n",
    "After selecting a risk model, we can create an instance of the risk model to pull information on the model coverage such as the available dates, asset coverage universe, available factors and model description. The `RiskModelCoverage` enum of the model indicates whether the scope of the universe is Global, Region or Country and the `Term` enum refers to the horizon of the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "social-attendance",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from gs_quant.models.risk_model import FactorRiskModel\n",
    "\n",
    "model_id = 'BARRA_USMEDS'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "# check available history for a factor model to decide start and end dates\n",
    "available_days = factor_model.get_dates()\n",
    "\n",
    "print(f'Data available for {model_id} from {available_days[0]} to {available_days[-1]}')\n",
    "print(f'{model_id}:\\n - Name: {factor_model.name}\\n - Description: {factor_model.description}\\n - Coverage: {factor_model.coverage.value}\\n - Horizon: {factor_model.term.value}')\n",
    "print(f'For all info https://marquee.gs.com/v1/risk/models/{model_id}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "median-individual",
   "metadata": {},
   "source": [
    "### Query Factor Data\n",
    "\n",
    "The following parameters are required for querying factor data:\n",
    "\n",
    "* `start_date` - date or datetime that is a business day\n",
    "* `end_date` - date or datetime that is a business day. If an end date is not specified, it will default to the last available date\n",
    "* `limit_factor` - A boolean to limit output to only exposed factors. Set to False when not querying data for a particular asset."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "documentary-trail",
   "metadata": {},
   "source": [
    "##### Get Available Factors\n",
    "\n",
    "For each model, we can retrieve a list of factors available. Each factor has a `name`, `id`, `type` and `factorCategory`.\n",
    "\n",
    "A factor's `factorCategory` can be one of the following:\n",
    "* Style - balance sheet and market metrics\n",
    "* Industry - an asset's line of business (i.e. Barra uses GICS classification)\n",
    "* Country - reference an asset’s exchange country location\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "improving-recycling",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import datetime as dt\n",
    "from gs_quant.models.risk_model import FactorRiskModel\n",
    "\n",
    "model_id = 'BARRA_USMEDS'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "available_factors = factor_model.get_factor_data(dt.date(2020, 1, 4)).set_index('identifier')\n",
    "available_factors.sort_values(by=['factorCategory']).tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cosmetic-contrary",
   "metadata": {},
   "source": [
    "##### Get All Factor Returns\n",
    "\n",
    "To query factor returns, we can either use `get_factor_returns_by_name` to retrieve the returns with names or `get_factor_returns_by_id` to get the returns with factor ids. We can leverage [the timeseries package](https://developer.gs.com/docs/gsquant/data/data-analytics/timeseries/) to transform and visualize the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "equal-thailand",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import datetime as dt\n",
    "from gs_quant.timeseries import beta\n",
    "from gs_quant.models.risk_model import FactorRiskModel\n",
    "\n",
    "model_id = 'BARRA_USMEDS'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "factor_returns = factor_model.get_factor_returns_by_name(dt.date(2020, 1, 4))\n",
    "fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))\n",
    "\n",
    "factor_returns[['Growth', 'Momentum', 'Size']].cumsum().plot(title='Factor Performance over Time for Risk Model', ax=ax[0])\n",
    "factor_beta = beta(factor_returns['Growth'], factor_returns['Momentum'], 63, prices = False)\n",
    "factor_beta.plot(title='3m Rolling Beta of Growth to Momentum', ax=ax[1])\n",
    "fig.autofmt_xdate()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "removable-boards",
   "metadata": {},
   "source": [
    "##### Covariance Matrix\n",
    "\n",
    "The covariance matrix represents an N-factor by N-factor matrix with the diagonal representing the variance of each factor for each day. The covariance matrix is in daily variance units."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bibliographic-medicare",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import datetime as dt\n",
    "import pandas as pd\n",
    "from gs_quant.models.risk_model import FactorRiskModel\n",
    "\n",
    "model_id = 'BARRA_USMEDS'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "cov_matrix = factor_model.get_covariance_matrix(dt.date(2021, 1, 4), dt.date(2021, 2, 26)) * 100\n",
    "\n",
    "# set display options below--set max_rows and max_columns to None to return full dataframe\n",
    "max_rows = 10\n",
    "max_columns = 7\n",
    "pd.set_option('display.max_rows', max_rows)\n",
    "pd.set_option('display.max_columns', max_columns)\n",
    "\n",
    "# get the last available matrix\n",
    "round(cov_matrix.loc['2021-02-26'], 6)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "operating-hearing",
   "metadata": {},
   "source": [
    "##### Factor Correlation and Volatility\n",
    "\n",
    "The `Factor` Class allows for quick analytics for a specified factor to easily support comparing one factor across different models or to another factor.\n",
    "\n",
    "The factor volatility and correlation functions use the covariance matrix for calculations:\n",
    "* Volatility is the square root of the diagonal\n",
    "* Correlation is derived from the covariance matrix by dividing the cov(x,y) by the vol(x) * vol(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "outdoor-security",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import datetime as dt\n",
    "import matplotlib.ticker as mtick\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from gs_quant.models.risk_model import FactorRiskModel\n",
    "\n",
    "model_id = 'BARRA_USMEDS'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "momentum = factor_model.get_factor('Momentum')\n",
    "growth = factor_model.get_factor('Growth')\n",
    "\n",
    "start_date = dt.date(2020, 1, 6)\n",
    "end_date = dt.date(2021, 2, 26)\n",
    "\n",
    "vol = momentum.volatility(start_date, end_date)\n",
    "corr = momentum.correlation(growth, start_date, end_date)\n",
    "\n",
    "# plot\n",
    "fig, ax1 = plt.subplots()\n",
    "ax2 = ax1.twinx()\n",
    "ax1.plot(vol.index, corr*100, 'g-', label='Momentum vs Growth Correlation (LHS)')\n",
    "ax1.yaxis.set_major_formatter(mtick.PercentFormatter())\n",
    "ax2.yaxis.set_major_formatter(mtick.PercentFormatter())\n",
    "ax2.plot(vol.index, vol*1e4, 'b-', label='Momentum Volatility (RHS)')\n",
    "plt.xticks(vol.index.values[::30])\n",
    "fig.legend(loc=\"lower right\", bbox_to_anchor=(.75, -0.10))\n",
    "fig.autofmt_xdate()\n",
    "plt.title('Momentum vs Growth Historical Factor Analysis')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "identified-terminology",
   "metadata": {},
   "source": [
    "### Query Asset Data\n",
    "\n",
    "The factor risk represents the beta coefficient that can be attributed to the model, whereas the specific (residual) risk refers to the error term that is not explained by the model.\n",
    "\n",
    "| Measure                    | Definition    |\n",
    "|----------------------------|---------------|\n",
    "| `Specific Risk`            | Annualized idiosyncratic risk or error term which is not attributable to factors in percent units |\n",
    "| `Total Risk`               | Annualized risk which is the sum of specific and factor risk in percent units |\n",
    "| `Historical Beta`          | The covariance of the residual returns relative to the model's estimation universe or benchmark (i.e results of a one factor model)  |\n",
    "| `Residual Variance`        | Daily error variance that is not explained by the model which is equal to $$\\frac{({\\frac{\\text{Specific Risk}}{100}})^2}{252}$$ |\n",
    "| `Universe Factor Exposure` | Z-score for each factor relative to the model's estimation universe |\n",
    "\n",
    "We can retrieve an asset universe on a given date by passing in an empty list and a `RiskModelUniverseIdentifierRequest` to the `DataAssetsRequest`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "spatial-problem",
   "metadata": {},
   "source": [
    "##### Get Risk Model Universe Coverage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "short-kennedy",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import datetime as dt\n",
    "import pandas as pd\n",
    "from gs_quant.models.risk_model import RiskModelUniverseIdentifierRequest as Identifier, DataAssetsRequest\n",
    "from gs_quant.models.risk_model import FactorRiskModel\n",
    "\n",
    "\n",
    "model_id = 'BARRA_USMEDS'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "asset_universe_for_request = DataAssetsRequest(Identifier.gsid, []) # entire universe\n",
    "universe_on_date = factor_model.get_asset_universe(dt.date(2021, 1, 4), assets=asset_universe_for_request)\n",
    "\n",
    "# set display options below--set max_rows to None to return full list of identifiers\n",
    "max_rows = 10\n",
    "pd.set_option('display.max_rows', max_rows)\n",
    "universe_on_date"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "relevant-brooklyn",
   "metadata": {},
   "source": [
    "##### Query Aggregated Risk\n",
    "\n",
    "For asset data, we can query for a specific measure or pull data for a list of measures over a range of dates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "copyrighted-elements",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import datetime as dt\n",
    "import matplotlib.pyplot as plt\n",
    "from gs_quant.models.risk_model import FactorRiskModel, DataAssetsRequest, RiskModelUniverseIdentifierRequest as Identifier\n",
    "\n",
    "model_id = 'BARRA_USMEDS'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "asset_bbid = 'AAPL UW'\n",
    "\n",
    "# get risk\n",
    "universe_for_request = DataAssetsRequest(Identifier.bbid, [asset_bbid])\n",
    "specific_risk = factor_model.get_specific_risk(dt.date(2020, 1, 4), dt.date(2021, 2, 24), universe_for_request)\n",
    "total_risk = factor_model.get_total_risk(dt.date(2020, 1, 4), dt.date(2021, 2, 24), universe_for_request)\n",
    "factor_risk = total_risk - specific_risk\n",
    "\n",
    "plt.stackplot(total_risk.index, specific_risk[asset_bbid], factor_risk[asset_bbid], labels=['Specific Risk','Factor Risk'])\n",
    "plt.title(f'{asset_bbid} Risk')\n",
    "plt.xticks(total_risk.index.values[::50])\n",
    "plt.legend(loc='upper right')\n",
    "plt.gcf().autofmt_xdate()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "frequent-lithuania",
   "metadata": {},
   "source": [
    "##### Query Factor Exposures (z-scores)\n",
    "\n",
    "When querying the asset factor exposures, set the `limit_factor` to True to receive only non zero exposures."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "micro-processor",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import seaborn as sns\n",
    "import datetime as dt\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from gs_quant.models.risk_model import FactorRiskModel, DataAssetsRequest, RiskModelUniverseIdentifierRequest as Identifier\n",
    "\n",
    "model_id = 'BARRA_USMEDS'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "asset_bbid = 'AAPL UW'\n",
    "universe_for_request = DataAssetsRequest(Identifier.bbid, [asset_bbid])\n",
    "\n",
    "factor_exposures = factor_model.get_universe_factor_exposure(dt.date(2020, 1,4), dt.date(2021, 2, 24), universe_for_request)\n",
    "\n",
    "available_factors = factor_model.get_factor_data(dt.date(2020, 1, 4)).set_index('identifier')\n",
    "available_factors.sort_values(by=['factorCategory']).tail()\n",
    "\n",
    "factor_exposures.columns = [available_factors.loc[x]['name'] for x in factor_exposures.columns]\n",
    "\n",
    "sns.boxplot(data=factor_exposures[['Beta', 'Momentum', 'Growth', 'Profitability']])\n",
    "plt.title(f'Distribution of {asset_bbid} Factor Exposures since 1/4/20')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "greatest-graham",
   "metadata": {},
   "source": [
    "##### Query Multiple Asset Measures"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "subtle-young",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import datetime as dt\n",
    "from gs_quant.models.risk_model import FactorRiskModel, Measure, DataAssetsRequest, RiskModelUniverseIdentifierRequest as Identifier\n",
    "\n",
    "model_id = 'BARRA_USMEDS'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "# get multiple measures across a date range for a universe specified\n",
    "start_date = dt.date(2021, 1, 4)\n",
    "end_date = dt.date(2021, 2, 24)\n",
    "\n",
    "asset_bbid = 'AAPL UW'\n",
    "universe_for_request = DataAssetsRequest(Identifier.bbid, [asset_bbid])\n",
    "\n",
    "data_measures = [Measure.Universe_Factor_Exposure, Measure.Asset_Universe, Measure.Historical_Beta, Measure.Specific_Risk]\n",
    "asset_risk_data = factor_model.get_data(data_measures, start_date, end_date, universe_for_request, limit_factors=True)\n",
    "\n",
    "for i in range(len(asset_risk_data.get('results'))):\n",
    "    date =  asset_risk_data.get('results')[i].get('date')\n",
    "    universe = asset_risk_data.get('results')[i].get('assetData').get('universe')\n",
    "    factor_exposure = asset_risk_data.get('results')[i].get('assetData').get('factorExposure')\n",
    "    historical_beta = asset_risk_data.get('results')[i].get('assetData').get('historicalBeta')\n",
    "    specific_risk = asset_risk_data.get('results')[i].get('assetData').get('specificRisk')\n",
    "    print(f'date: {date}')\n",
    "    print(f'universe: {universe}')\n",
    "    print(f'factor id to factor exposure: {factor_exposure}')\n",
    "    print(f'historical beta: {historical_beta}')\n",
    "    print(f'specific risk: {specific_risk}')\n",
    "    print('\\n')\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}