{"cells":[{"cell_type":"markdown","metadata":{},"source":["# Score Evaluation Demo\n","- User Story \n","I am a data scientist and I've built a model to predict the stock price. I have my test data set and trained model, but I need evaluate if this model performance as well as expected. \n","- Solution \n","I installed score-eval package and run evaluation on my data set plus the model. I would like to evaluate in couple of metrics. \n","1. The overall performance. I would like to check the PR chart and WOE/IV chart. Here I put some background on those two chart, please skip if you are familiar with them. \n"," 1. PR chart. The x-axis is 'recall' and the y-axis is 'precision'. When we try to achieve higher precision we usually lose certain recall and vice versa. It is quite difficult to achieve perfect classification so we have to trade off. PR chart shows the balance between the two metrics and we can compare how multiple models performs by this way. \n"," 2. WOE/IV chart. The x-axis is 'score bins' and the y-axis is 'WOE'. It is a statistical method to check how the 1-samples and 0-samples are separated in the bin. In most real-world cases, we adopt a machine learning model only at the highest and lowest score bins. We do not adopt the middle 'grey' part, which is not safe for follow-up decisions. WOE/IV chart is quite intuitive tool letting us understand the performance in score bins. \n","\n","2. The score cutoff performance. If we still need more details to decide which model or which cutoff is a good choice, I suggest to look into the op/cutoff charts.Here I put some backgound on OP and Cutoff, please skip if you are familiar with them. \n"," 1. OP(Operating Point). This is quite straightforward, it is saying how many proportion of the population can be operating. It is usually calculated by percent, from high to low. eg. OP 10 means the 10% samples with the highest score in your data set. \n"," 2. Cut-off. This is not calculated by percent but by the score value itself. Assuming the score is in interval, the cut-off can be 0.1, 0.2, ... 0.9 etc. \n","\n","3. Metrics along time. In real cases, the samples come up from time to time, but the evaluation data set is usually static. The above methods are doing the evaluation as a whole. It is also necessary to check the metrics by time(time-step, can be days, weeks, hours etc.)."]},{"cell_type":"markdown","metadata":{},"source":["**Install**"]},{"cell_type":"code","execution_count":35,"metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["Requirement already satisfied: score-eval in d:\\python\\python37\\lib\\site-packages (0.0.1)\n"]},{"name":"stderr","output_type":"stream","text":["WARNING: You are using pip version 20.1.1; however, version 22.0.4 is available.\n","You should consider upgrading via the 'd:\\python\\python37\\python.exe -m pip install --upgrade pip' command.\n"]}],"source":["!pip install score-eval"]},{"cell_type":"code","execution_count":39,"metadata":{"execution":{"iopub.execute_input":"2022-05-02T09:06:07.171621Z","iopub.status.busy":"2022-05-02T09:06:07.171320Z","iopub.status.idle":"2022-05-02T09:06:07.177369Z","shell.execute_reply":"2022-05-02T09:06:07.176602Z","shell.execute_reply.started":"2022-05-02T09:06:07.171590Z"},"trusted":true},"outputs":[],"source":["from scoreval import scoreval as se\n","\n","import pandas as pd\n","import numpy as np\n","import matplotlib.pyplot as plt\n","import tensorflow as tf\n","import os"]},{"cell_type":"code","execution_count":37,"metadata":{},"outputs":[],"source":["# Set path parameters, this is only applicable in this example and you can set it as you wish\n","BASE_DIR = \"E:\\\\workspace\\\\stock\"\n","RUN_DATE = '20220304' # str(datetime.now())[:10].replace('-','')\n","\n","# Load test data\n","DATA_DIR = os.path.join(BASE_DIR,'data', RUN_DATE) # path to save processed data files\n","\n","# Load model file\n","MODEL_DIR = os.path.join(BASE_DIR,'python/tf/models') # path where the model spec is located\n"]},{"cell_type":"markdown","metadata":{},"source":["**Load data set** \n","My data set is saved in a npz file. This is none of the matters for ScoreEval, but relevant with the model implementation. "]},{"cell_type":"code","execution_count":15,"metadata":{},"outputs":[],"source":["with open(os.path.join(DATA_DIR, 'oot-xl.npz'), 'rb') as fp:\n"," oot_data = np.load(fp)\n"," oot_X_arr, oot_S_arr, oot_P_arr, oot_I_arr, oot_Y_arr, oot_R_arr = oot_data['x'], oot_data['s'], oot_data['p'],oot_data['i'],oot_data['y'],oot_data['r']"]},{"cell_type":"code","execution_count":40,"metadata":{"execution":{"iopub.execute_input":"2022-04-30T05:58:59.214454Z","iopub.status.busy":"2022-04-30T05:58:59.214181Z","iopub.status.idle":"2022-04-30T05:58:59.25195Z","shell.execute_reply":"2022-04-30T05:58:59.251224Z","shell.execute_reply.started":"2022-04-30T05:58:59.21442Z"},"trusted":true},"outputs":[{"data":{"text/html":["
| \n"," | label | \n","symbol | \n","date | \n","
|---|---|---|---|
| 0 | \n","0 | \n","SH603198 | \n","2021-11-02 | \n","
| 1 | \n","0 | \n","SH603198 | \n","2021-06-17 | \n","
| 2 | \n","0 | \n","SH603198 | \n","2021-07-20 | \n","
| 3 | \n","0 | \n","SH600436 | \n","2021-07-01 | \n","
| 4 | \n","0 | \n","SH600436 | \n","2021-03-10 | \n","