{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we'll generate a random matrix"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"#Number of columns (features)\n",
"K = 5\n",
"\n",
"#Number of records\n",
"N = 1000\n",
"\n",
"#Generate an NxK matrix of uniform random variables\n",
"X = np.random.random([N, K])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's peak at our data to confirm it looks as we expect it"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0.9786862 , 0.50933013, 0.47327265, 0.57154566, 0.17725616])"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X[100, :]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1000, 5)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This exercise is about designing a scoring function for a logistic regression. As we are not concerned with fitting a model to data, we can just make up a logistic regression.
\n",
"\n",
"For quick intro, the Logistic Regression takes the form of $\\hat{Y} = f(x * \\beta^T)$, where $x$ is the $1xK$ vector of features and $\\beta$ is the $1xK$ vector of weights. The function $f$, called a 'link' function, is the inverse logit:
\n",
"\n",
"