{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[{"file_id":"1kwCNzKkWIue2qfwpvAuKCzi7aOslcWET","timestamp":1744812041614}],"toc_visible":true,"authorship_tag":"ABX9TyNRUaixbD4EmzFetd2WH6Sv"},"kernelspec":{"name":"ir","display_name":"R"},"language_info":{"name":"R"}},"cells":[{"cell_type":"markdown","source":["# Abstract\n","\n","The purpose of this notebook is to allow users to become familiar with model fitting in a Google colaboratory (colab) environment using Jupyter notebooks. The programming code is R, not Python. For additional resources:\n","\n","* [Introduction to Google Colab](https://www.tutorialspoint.com/google_colab/index.htm), website\n","* [Introduction to Jupyter Notebooks.ipynb](https://github.com/OpenActTextDev/ActuarialRegression/blob/main/Notebooks/Introduction%20to%20Jupyter%20Notebooks.ipynb), Jupyter notebook available from the course Github repository (repo).\n","\n","\n","\n"," \n","
\n"," \"Open\n","
\n","\n","\n","\n"],"metadata":{"id":"LX_xshTR5dQL"}},{"cell_type":"markdown","source":["# Getting Started\n","\n","Start by going to \"Runtime\", select \"Change Runtime Type\", ensure that \"R\" is selected, and click \"Save\".\n","\n","Next, run the following two boxes. The first checks that we can do basic calculations. The second is an \"R\" command (generate 10 standard normal random variates), to make sure that \"R\" is working. Then, try changing the numbers so you can convince yourself that this is an interactive environment."],"metadata":{"id":"aEwkH6hekFwy"}},{"cell_type":"code","source":["1 + 4"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":34},"id":"-AY0xDhI5jTG","executionInfo":{"status":"ok","timestamp":1746460673324,"user_tz":300,"elapsed":20,"user":{"displayName":"EDWARD FREES","userId":"04339356503797759391"}},"outputId":"857ccdc3-81bb-4df5-d0a4-ba774ba0d958"},"execution_count":null,"outputs":[{"output_type":"display_data","data":{"text/html":["5"],"text/markdown":"5","text/latex":"5","text/plain":["[1] 5"]},"metadata":{}}]},{"cell_type":"code","source":["rnorm(10)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":52},"id":"ECDLZ0DoDjBk","executionInfo":{"status":"ok","timestamp":1746460673394,"user_tz":300,"elapsed":65,"user":{"displayName":"EDWARD FREES","userId":"04339356503797759391"}},"outputId":"a823c671-024a-4a83-92ea-38363376916a"},"execution_count":null,"outputs":[{"output_type":"display_data","data":{"text/html":["\n","
  1. 0.898123924598654
  2. -0.852576191795118
  3. -0.189245668283116
  4. -0.882185981402365
  5. 1.37011604595435
  6. -1.80415538467972
  7. 0.107060010754886
  8. -0.3054553211401
  9. -0.342670024618139
  10. 0.480125313339421
\n"],"text/markdown":"1. 0.898123924598654\n2. -0.852576191795118\n3. -0.189245668283116\n4. -0.882185981402365\n5. 1.37011604595435\n6. -1.80415538467972\n7. 0.107060010754886\n8. -0.3054553211401\n9. -0.342670024618139\n10. 0.480125313339421\n\n\n","text/latex":"\\begin{enumerate*}\n\\item 0.898123924598654\n\\item -0.852576191795118\n\\item -0.189245668283116\n\\item -0.882185981402365\n\\item 1.37011604595435\n\\item -1.80415538467972\n\\item 0.107060010754886\n\\item -0.3054553211401\n\\item -0.342670024618139\n\\item 0.480125313339421\n\\end{enumerate*}\n","text/plain":[" [1] 0.898 -0.853 -0.189 -0.882 1.370 -1.804 0.107 -0.305 -0.343 0.480"]},"metadata":{}}]},{"cell_type":"markdown","source":["You can make changes to a Colab notebook, and they will persist for as long as you keep your browser tab open. But once you close it, the changes will be lost. To avoid this, make sure you save a copy of the notebook to your Google Drive by selecting File → “Save a copy in Drive”."],"metadata":{"id":"n1bA_o4z7Nh0"}},{"cell_type":"markdown","source":["## Load Libraries\n","\n","As is the custom with Jupyter notebooks, we start by loading some libraries that are going to be helpful. In the following code, we also indicate where th libraries are needed with comments."],"metadata":{"id":"N2hdKHe4JDi4"}},{"cell_type":"code","source":["install.packages(\"Hmisc\")\n","library(Hmisc)\n","install.packages(\"doBy\")\n","library(doBy)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"9x382MrGJSQQ","executionInfo":{"status":"ok","timestamp":1749313858217,"user_tz":300,"elapsed":132087,"user":{"displayName":"EDWARD FREES","userId":"04339356503797759391"}},"outputId":"f829676d-c57a-4df9-c640-05e4e4ef3e9f"},"execution_count":17,"outputs":[{"output_type":"stream","name":"stderr","text":["Installing package into ‘/usr/local/lib/R/site-library’\n","(as ‘lib’ is unspecified)\n","\n","also installing the dependencies ‘checkmate’, ‘gridExtra’, ‘htmlTable’, ‘viridis’, ‘colorspace’, ‘Formula’\n","\n","\n","\n","Attaching package: ‘Hmisc’\n","\n","\n","The following objects are masked from ‘package:base’:\n","\n"," format.pval, units\n","\n","\n","Installing package into ‘/usr/local/lib/R/site-library’\n","(as ‘lib’ is unspecified)\n","\n","also installing the dependencies ‘cowplot’, ‘Deriv’, ‘microbenchmark’\n","\n","\n"]}]},{"cell_type":"markdown","source":["# Claim Counts\n","\n","This tutorial presents a study in the context of auto liability insurance focusing on claim counts.\n","\n","The data set is well known in the actuarial literature. The data source is the book edited by Arthur Charpentier entitled *Computational Actuarial Science with R*, (2016), by CRC Press. In this tutorial, we utilize ideas from:\n","\n","* Professor Dani Bauer's tutorials available at his [Github site](https://github.com/danielbauer1979/MSDIA_PredictiveModelingAndMachineLearning).\n","* Dr. Patrick Laub's presentation at his course [Artificial Intelligence and Deep Learning Models for Actuarial Applications](https://laub.au/DeepLearningForActuaries/)\n","* A case study by Noll, Salman, Wütrich (2020), *Case Study: French Motor Third-Party Liability Claims*, available at [SSRN](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3164764) with [R code](https://github.com/actuarial-data-science/Tutorials/tree/master/1%20-%20French%20Motor%20Third-Party%20Liability%20Claims).\n"," * This case study was the basis for a [tutorial by Professor Mario Wütrich](https://github.com/JSchelldorfer/DeepLearningWithActuarialApplications/blob/master/DeepLearningWithActuarialApplications.pdf).\n","\n","The dataset is fairly large, containing 678007 observations of 14 variables. So, this helps underscore the value of working on remote servers such as through Google colab.\n","\n","\n","As is the custom with Jupyter notebooks, we start by loading the libraries that are going to be helpful."],"metadata":{"id":"XE7lp5pnGA8Y"}},{"cell_type":"markdown","source":["# Import and Describe the Data\n","\n","First, we import the data from Github course repo. At the repo \"https://github.com/OpenActTextDev/ActuarialRegression\", you will see other datasets."],"metadata":{"id":"7jxfSRWT8pU2"}},{"cell_type":"code","source":["dat_frq <- read.csv(\"https://raw.githubusercontent.com/OpenActTextDev/ActuarialRegression/refs/heads/main/CourseCSVData/freMTPL2freq.csv\")"],"metadata":{"id":"tY6CxxEn7Yv9","executionInfo":{"status":"ok","timestamp":1749311790079,"user_tz":300,"elapsed":12918,"user":{"displayName":"EDWARD FREES","userId":"04339356503797759391"}}},"execution_count":1,"outputs":[]},{"cell_type":"markdown","source":["Just to make sure that data was read in properly, let's look at the structure, do a summary, and examine the first few observations."],"metadata":{"id":"geJTceeBFk7e"}},{"cell_type":"code","source":["str(dat_frq)"],"metadata":{"id":"44L0LqSeFjf1","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1749317901468,"user_tz":300,"elapsed":59,"user":{"displayName":"EDWARD FREES","userId":"04339356503797759391"}},"outputId":"01c686f6-481a-4781-cfcc-245c093d97bb"},"execution_count":35,"outputs":[{"output_type":"stream","name":"stdout","text":["'data.frame':\t678007 obs. of 14 variables:\n"," $ X : int 1 2 3 4 5 6 7 8 9 10 ...\n"," $ IDpol : num 1 3 5 10 11 13 15 17 18 21 ...\n"," $ Exposure : num 0.1 0.77 0.75 0.09 0.84 0.52 0.45 0.27 0.71 0.15 ...\n"," $ Area : chr \"D\" \"D\" \"B\" \"B\" ...\n"," $ VehPower : int 5 5 6 7 7 6 6 7 7 7 ...\n"," $ VehAge : int 0 0 2 0 0 2 2 0 0 0 ...\n"," $ DrivAge : int 55 55 52 46 46 38 38 33 33 41 ...\n"," $ BonusMalus: int 50 50 50 50 50 50 50 68 68 50 ...\n"," $ VehBrand : chr \"B12\" \"B12\" \"B12\" \"B12\" ...\n"," $ VehGas : chr \"Regular\" \"Regular\" \"Diesel\" \"Diesel\" ...\n"," $ Density : int 1217 1217 54 76 76 3003 3003 137 137 60 ...\n"," $ Region : chr \"R82\" \"R82\" \"R22\" \"R72\" ...\n"," $ ClaimTotal: num 0 0 0 0 0 0 0 0 0 0 ...\n"," $ ClaimNb : int 0 0 0 0 0 0 0 0 0 0 ...\n"]}]},{"cell_type":"markdown","source":["Remove the 'X' variable (row label)"],"metadata":{"id":"RYUDTuIyeQsR"}},{"cell_type":"code","source":["dat_frq$X <- NULL\n","summary(dat_frq)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":512},"id":"B5tNyEVVHgSk","executionInfo":{"status":"ok","timestamp":1749317905258,"user_tz":300,"elapsed":373,"user":{"displayName":"EDWARD FREES","userId":"04339356503797759391"}},"outputId":"db013418-67f4-4f9a-e3c7-43973f3044aa"},"execution_count":36,"outputs":[{"output_type":"display_data","data":{"text/plain":[" IDpol Exposure Area VehPower \n"," Min. : 1 Min. :0.002732 Length:678007 Min. : 4.000 \n"," 1st Qu.:1157948 1st Qu.:0.180000 Class :character 1st Qu.: 5.000 \n"," Median :2272153 Median :0.490000 Mode :character Median : 6.000 \n"," Mean :2621857 Mean :0.528547 Mean : 6.455 \n"," 3rd Qu.:4046278 3rd Qu.:0.990000 3rd Qu.: 7.000 \n"," Max. :6114330 Max. :1.000000 Max. :15.000 \n"," VehAge DrivAge BonusMalus VehBrand \n"," Min. : 0.000 Min. : 18.0 Min. : 50.00 Length:678007 \n"," 1st Qu.: 2.000 1st Qu.: 34.0 1st Qu.: 50.00 Class :character \n"," Median : 6.000 Median : 44.0 Median : 50.00 Mode :character \n"," Mean : 7.044 Mean : 45.5 Mean : 59.76 \n"," 3rd Qu.: 11.000 3rd Qu.: 55.0 3rd Qu.: 64.00 \n"," Max. :100.000 Max. :100.0 Max. :230.00 \n"," VehGas Density Region ClaimTotal \n"," Length:678007 Min. : 1 Length:678007 Min. : 0.0 \n"," Class :character 1st Qu.: 92 Class :character 1st Qu.: 0.0 \n"," Mode :character Median : 393 Mode :character Median : 0.0 \n"," Mean : 1792 Mean : 88.2 \n"," 3rd Qu.: 1658 3rd Qu.: 0.0 \n"," Max. :27000 Max. :4075400.6 \n"," ClaimNb \n"," Min. :0.00000 \n"," 1st Qu.:0.00000 \n"," Median :0.00000 \n"," Mean :0.03891 \n"," 3rd Qu.:0.00000 \n"," Max. :5.00000 "]},"metadata":{}}]},{"cell_type":"code","source":["head(dat_frq)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":306},"id":"f_QFkoRXIB5R","executionInfo":{"status":"ok","timestamp":1749317941063,"user_tz":300,"elapsed":44,"user":{"displayName":"EDWARD FREES","userId":"04339356503797759391"}},"outputId":"3c3ab43f-2e34-4b21-c879-f35dd48729ef"},"execution_count":37,"outputs":[{"output_type":"display_data","data":{"text/html":["\n","\n","\n","\t\n","\t\n","\n","\n","\t\n","\t\n","\t\n","\t\n","\t\n","\t\n","\n","
A data.frame: 6 × 13
IDpolExposureAreaVehPowerVehAgeDrivAgeBonusMalusVehBrandVehGasDensityRegionClaimTotalClaimNb
<dbl><dbl><chr><int><int><int><int><chr><chr><int><chr><dbl><int>
1 10.10D505550B12Regular1217R8200
2 30.77D505550B12Regular1217R8200
3 50.75B625250B12Diesel 54R2200
4100.09B704650B12Diesel 76R7200
5110.84B704650B12Diesel 76R7200
6130.52E623850B12Regular3003R3100
\n"],"text/markdown":"\nA data.frame: 6 × 13\n\n| | IDpol <dbl> | Exposure <dbl> | Area <chr> | VehPower <int> | VehAge <int> | DrivAge <int> | BonusMalus <int> | VehBrand <chr> | VehGas <chr> | Density <int> | Region <chr> | ClaimTotal <dbl> | ClaimNb <int> |\n|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n| 1 | 1 | 0.10 | D | 5 | 0 | 55 | 50 | B12 | Regular | 1217 | R82 | 0 | 0 |\n| 2 | 3 | 0.77 | D | 5 | 0 | 55 | 50 | B12 | Regular | 1217 | R82 | 0 | 0 |\n| 3 | 5 | 0.75 | B | 6 | 2 | 52 | 50 | B12 | Diesel | 54 | R22 | 0 | 0 |\n| 4 | 10 | 0.09 | B | 7 | 0 | 46 | 50 | B12 | Diesel | 76 | R72 | 0 | 0 |\n| 5 | 11 | 0.84 | B | 7 | 0 | 46 | 50 | B12 | Diesel | 76 | R72 | 0 | 0 |\n| 6 | 13 | 0.52 | E | 6 | 2 | 38 | 50 | B12 | Regular | 3003 | R31 | 0 | 0 |\n\n","text/latex":"A data.frame: 6 × 13\n\\begin{tabular}{r|lllllllllllll}\n & IDpol & Exposure & Area & VehPower & VehAge & DrivAge & BonusMalus & VehBrand & VehGas & Density & Region & ClaimTotal & ClaimNb\\\\\n & & & & & & & & & & & & & \\\\\n\\hline\n\t1 & 1 & 0.10 & D & 5 & 0 & 55 & 50 & B12 & Regular & 1217 & R82 & 0 & 0\\\\\n\t2 & 3 & 0.77 & D & 5 & 0 & 55 & 50 & B12 & Regular & 1217 & R82 & 0 & 0\\\\\n\t3 & 5 & 0.75 & B & 6 & 2 & 52 & 50 & B12 & Diesel & 54 & R22 & 0 & 0\\\\\n\t4 & 10 & 0.09 & B & 7 & 0 & 46 & 50 & B12 & Diesel & 76 & R72 & 0 & 0\\\\\n\t5 & 11 & 0.84 & B & 7 & 0 & 46 & 50 & B12 & Diesel & 76 & R72 & 0 & 0\\\\\n\t6 & 13 & 0.52 & E & 6 & 2 & 38 & 50 & B12 & Regular & 3003 & R31 & 0 & 0\\\\\n\\end{tabular}\n","text/plain":[" IDpol Exposure Area VehPower VehAge DrivAge BonusMalus VehBrand VehGas \n","1 1 0.10 D 5 0 55 50 B12 Regular\n","2 3 0.77 D 5 0 55 50 B12 Regular\n","3 5 0.75 B 6 2 52 50 B12 Diesel \n","4 10 0.09 B 7 0 46 50 B12 Diesel \n","5 11 0.84 B 7 0 46 50 B12 Diesel \n","6 13 0.52 E 6 2 38 50 B12 Regular\n"," Density Region ClaimTotal ClaimNb\n","1 1217 R82 0 0 \n","2 1217 R82 0 0 \n","3 54 R22 0 0 \n","4 76 R72 0 0 \n","5 76 R72 0 0 \n","6 3003 R31 0 0 "]},"metadata":{}}]},{"cell_type":"markdown","source":["Here are some brief descriptions of the variables:\n","\n","- `IDpol`: policy number (unique identifier)\n","- `ClaimNb`: number of claims on the given policy\n","- `Exposure`: total exposure in yearly units\n","- `Area`: area code (categorical, ordinal)\n","- `VehPower`: power of the car (categorical, ordinal)\n","- `VehAge`: age of the car in years\n","- `DrivAge`: age of the (most common) driver in years\n","- `BonusMalus`: bonus-malus level between 50 and 230 (with reference level 100)\n","- `VehBrand`: car brand (categorical, nominal)\n","- `VehGas`: diesel or regular fuel car (binary)\n","- `Density`: density of inhabitants per km^2^ in the city of the living place of the driver\n","- `Region`: regions in France (prior to 2016)\n","\n","Note that the variable 'Area' is defined as an ordinal variable. In their development of the case, Noll, Salman, Wütrich (2020) carefully describe how some areas are bigger than others, so that the variable can be considered as nominal. This is an interesting extension. Because of the overlap and the categorical variable 'Region', we will exclude the variable 'Area' from our analysis."],"metadata":{"id":"iS_jdZPTH0qf"}},{"cell_type":"code","source":["custom_summary <- function(x) {\n"," c(\n"," count = sum(!is.na(x)),\n"," mean = mean(x, na.rm = TRUE),\n"," sd = sd(x, na.rm = TRUE),\n"," min = min(x, na.rm = TRUE),\n"," `1%` = quantile(x, 0.01, na.rm = TRUE),\n"," `50%` = quantile(x, 0.5, na.rm = TRUE),\n"," `99.99%` = quantile(x, 0.9999, na.rm = TRUE),\n"," max = max(x, na.rm = TRUE)\n"," )\n","}\n","\n","numeric_id <- sapply(dat_frq, is.numeric)\n","integer_id <- sapply(dat_frq, is.integer)\n","char_id <- sapply(dat_frq, is.character)\n","numeric_vars <- dat_frq[,numeric_id]\n","integer_vars <- dat_frq[,integer_id]\n","char_vars <- dat_frq[,char_id]\n","#str(numeric_vars)\n","#str(integer_vars)\n","#str(char_vars)\n","summary_df <- sapply(numeric_vars, custom_summary)\n","round(t(summary_df), digits = 2); # Transpose for easier viewing\n"],"metadata":{"id":"8w05UAFlHBtI","colab":{"base_uri":"https://localhost:8080/","height":526},"executionInfo":{"status":"ok","timestamp":1749318304740,"user_tz":300,"elapsed":775,"user":{"displayName":"EDWARD FREES","userId":"04339356503797759391"}},"outputId":"a370f481-72cc-4b71-f18f-4f46e79d95fa"},"execution_count":51,"outputs":[{"output_type":"stream","name":"stdout","text":["'data.frame':\t678007 obs. of 9 variables:\n"," $ IDpol : num 1 3 5 10 11 13 15 17 18 21 ...\n"," $ Exposure : num 0.1 0.77 0.75 0.09 0.84 0.52 0.45 0.27 0.71 0.15 ...\n"," $ VehPower : int 5 5 6 7 7 6 6 7 7 7 ...\n"," $ VehAge : int 0 0 2 0 0 2 2 0 0 0 ...\n"," $ DrivAge : int 55 55 52 46 46 38 38 33 33 41 ...\n"," $ BonusMalus: int 50 50 50 50 50 50 50 68 68 50 ...\n"," $ Density : int 1217 1217 54 76 76 3003 3003 137 137 60 ...\n"," $ ClaimTotal: num 0 0 0 0 0 0 0 0 0 0 ...\n"," $ ClaimNb : int 0 0 0 0 0 0 0 0 0 0 ...\n"]},{"output_type":"display_data","data":{"text/html":["\n","\n","\n","\t\n","\n","\n","\t\n","\t\n","\t\n","\t\n","\t\n","\t\n","\t\n","\t\n","\t\n","\n","
A matrix: 9 × 8 of type dbl
countmeansdmin1%.1%50%.50%99.99%.99.99%max
IDpol6780072621857.331641789.39 119934.122272153.006114262.206114330
Exposure678007 0.53 0.36 0 0.01 0.49 1.00 1
VehPower678007 6.45 2.05 4 4.00 6.00 15.00 15
VehAge678007 7.04 5.67 0 0.00 6.00 69.00 100
DrivAge678007 45.50 14.1418 20.00 44.00 99.00 100
BonusMalus678007 59.76 15.6450 50.00 50.00 158.00 230
Density678007 1792.43 3958.66 1 10.00 393.00 27000.00 27000
ClaimTotal678007 88.17 5822.07 0 0.00 0.00 65429.464075401
ClaimNb678007 0.04 0.20 0 0.00 0.00 3.00 5
\n"],"text/markdown":"\nA matrix: 9 × 8 of type dbl\n\n| | count | mean | sd | min | 1%.1% | 50%.50% | 99.99%.99.99% | max |\n|---|---|---|---|---|---|---|---|---|\n| IDpol | 678007 | 2621857.33 | 1641789.39 | 1 | 19934.12 | 2272153.00 | 6114262.20 | 6114330 |\n| Exposure | 678007 | 0.53 | 0.36 | 0 | 0.01 | 0.49 | 1.00 | 1 |\n| VehPower | 678007 | 6.45 | 2.05 | 4 | 4.00 | 6.00 | 15.00 | 15 |\n| VehAge | 678007 | 7.04 | 5.67 | 0 | 0.00 | 6.00 | 69.00 | 100 |\n| DrivAge | 678007 | 45.50 | 14.14 | 18 | 20.00 | 44.00 | 99.00 | 100 |\n| BonusMalus | 678007 | 59.76 | 15.64 | 50 | 50.00 | 50.00 | 158.00 | 230 |\n| Density | 678007 | 1792.43 | 3958.66 | 1 | 10.00 | 393.00 | 27000.00 | 27000 |\n| ClaimTotal | 678007 | 88.17 | 5822.07 | 0 | 0.00 | 0.00 | 65429.46 | 4075401 |\n| ClaimNb | 678007 | 0.04 | 0.20 | 0 | 0.00 | 0.00 | 3.00 | 5 |\n\n","text/latex":"A matrix: 9 × 8 of type dbl\n\\begin{tabular}{r|llllllll}\n & count & mean & sd & min & 1\\%.1\\% & 50\\%.50\\% & 99.99\\%.99.99\\% & max\\\\\n\\hline\n\tIDpol & 678007 & 2621857.33 & 1641789.39 & 1 & 19934.12 & 2272153.00 & 6114262.20 & 6114330\\\\\n\tExposure & 678007 & 0.53 & 0.36 & 0 & 0.01 & 0.49 & 1.00 & 1\\\\\n\tVehPower & 678007 & 6.45 & 2.05 & 4 & 4.00 & 6.00 & 15.00 & 15\\\\\n\tVehAge & 678007 & 7.04 & 5.67 & 0 & 0.00 & 6.00 & 69.00 & 100\\\\\n\tDrivAge & 678007 & 45.50 & 14.14 & 18 & 20.00 & 44.00 & 99.00 & 100\\\\\n\tBonusMalus & 678007 & 59.76 & 15.64 & 50 & 50.00 & 50.00 & 158.00 & 230\\\\\n\tDensity & 678007 & 1792.43 & 3958.66 & 1 & 10.00 & 393.00 & 27000.00 & 27000\\\\\n\tClaimTotal & 678007 & 88.17 & 5822.07 & 0 & 0.00 & 0.00 & 65429.46 & 4075401\\\\\n\tClaimNb & 678007 & 0.04 & 0.20 & 0 & 0.00 & 0.00 & 3.00 & 5\\\\\n\\end{tabular}\n","text/plain":[" count mean sd min 1%.1% 50%.50% 99.99%.99.99%\n","IDpol 678007 2621857.33 1641789.39 1 19934.12 2272153.00 6114262.20 \n","Exposure 678007 0.53 0.36 0 0.01 0.49 1.00 \n","VehPower 678007 6.45 2.05 4 4.00 6.00 15.00 \n","VehAge 678007 7.04 5.67 0 0.00 6.00 69.00 \n","DrivAge 678007 45.50 14.14 18 20.00 44.00 99.00 \n","BonusMalus 678007 59.76 15.64 50 50.00 50.00 158.00 \n","Density 678007 1792.43 3958.66 1 10.00 393.00 27000.00 \n","ClaimTotal 678007 88.17 5822.07 0 0.00 0.00 65429.46 \n","ClaimNb 678007 0.04 0.20 0 0.00 0.00 3.00 \n"," max \n","IDpol 6114330\n","Exposure 1\n","VehPower 15\n","VehAge 100\n","DrivAge 100\n","BonusMalus 230\n","Density 27000\n","ClaimTotal 4075401\n","ClaimNb 5"]},"metadata":{}}]},{"cell_type":"markdown","source":["Some extreme values of 'VehAge' and 'BonusMalus' are unrealistic. Let's censor them at the 99.99th percentile."],"metadata":{"id":"8k7uizW3Lxry"}},{"cell_type":"code","source":["winsorize <- function(x, p = 0.01) {\n"," qnt <- quantile(x, probs = c(p, 1 - p), na.rm = TRUE)\n"," x[x < qnt[1]] <- qnt[1]\n"," x[x > qnt[2]] <- qnt[2]\n"," return(x)\n"," }\n","dat_frqMod <- dat_frq\n","dat_frqMod$VehAge<- winsorize(dat_frqMod$VehAge, p = 0.0001) # like 0.01% and 99.99%\n","dat_frqMod$BonusMalus<- winsorize(dat_frqMod$BonusMalus, p = 0.0001)\n","numeric_vars <- sapply(dat_frqMod, is.numeric)\n","summary_df <- sapply(dat_frqMod[, numeric_vars], custom_summary)\n","round(t(summary_df), digits = 2); # Transpose for easier viewing"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":349},"id":"z68vq-xQLwvs","executionInfo":{"status":"ok","timestamp":1749317964991,"user_tz":300,"elapsed":929,"user":{"displayName":"EDWARD FREES","userId":"04339356503797759391"}},"outputId":"5188d3b4-c1b4-426a-f124-cf77f0618385"},"execution_count":39,"outputs":[{"output_type":"display_data","data":{"text/html":["\n","\n","\n","\t\n","\n","\n","\t\n","\t\n","\t\n","\t\n","\t\n","\t\n","\t\n","\t\n","\t\n","\n","
A matrix: 9 × 8 of type dbl
countmeansdmin1%.1%50%.50%99.99%.99.99%max
IDpol6780072621857.331641789.39 119934.122272153.006114262.206114330
Exposure678007 0.53 0.36 0 0.01 0.49 1.00 1
VehPower678007 6.45 2.05 4 4.00 6.00 15.00 15
VehAge678007 7.04 5.63 0 0.00 6.00 69.00 69
DrivAge678007 45.50 14.1418 20.00 44.00 99.00 100
BonusMalus678007 59.76 15.6250 50.00 50.00 158.00 158
Density678007 1792.43 3958.66 1 10.00 393.00 27000.00 27000
ClaimTotal678007 88.17 5822.07 0 0.00 0.00 65429.464075401
ClaimNb678007 0.04 0.20 0 0.00 0.00 3.00 5
\n"],"text/markdown":"\nA matrix: 9 × 8 of type dbl\n\n| | count | mean | sd | min | 1%.1% | 50%.50% | 99.99%.99.99% | max |\n|---|---|---|---|---|---|---|---|---|\n| IDpol | 678007 | 2621857.33 | 1641789.39 | 1 | 19934.12 | 2272153.00 | 6114262.20 | 6114330 |\n| Exposure | 678007 | 0.53 | 0.36 | 0 | 0.01 | 0.49 | 1.00 | 1 |\n| VehPower | 678007 | 6.45 | 2.05 | 4 | 4.00 | 6.00 | 15.00 | 15 |\n| VehAge | 678007 | 7.04 | 5.63 | 0 | 0.00 | 6.00 | 69.00 | 69 |\n| DrivAge | 678007 | 45.50 | 14.14 | 18 | 20.00 | 44.00 | 99.00 | 100 |\n| BonusMalus | 678007 | 59.76 | 15.62 | 50 | 50.00 | 50.00 | 158.00 | 158 |\n| Density | 678007 | 1792.43 | 3958.66 | 1 | 10.00 | 393.00 | 27000.00 | 27000 |\n| ClaimTotal | 678007 | 88.17 | 5822.07 | 0 | 0.00 | 0.00 | 65429.46 | 4075401 |\n| ClaimNb | 678007 | 0.04 | 0.20 | 0 | 0.00 | 0.00 | 3.00 | 5 |\n\n","text/latex":"A matrix: 9 × 8 of type dbl\n\\begin{tabular}{r|llllllll}\n & count & mean & sd & min & 1\\%.1\\% & 50\\%.50\\% & 99.99\\%.99.99\\% & max\\\\\n\\hline\n\tIDpol & 678007 & 2621857.33 & 1641789.39 & 1 & 19934.12 & 2272153.00 & 6114262.20 & 6114330\\\\\n\tExposure & 678007 & 0.53 & 0.36 & 0 & 0.01 & 0.49 & 1.00 & 1\\\\\n\tVehPower & 678007 & 6.45 & 2.05 & 4 & 4.00 & 6.00 & 15.00 & 15\\\\\n\tVehAge & 678007 & 7.04 & 5.63 & 0 & 0.00 & 6.00 & 69.00 & 69\\\\\n\tDrivAge & 678007 & 45.50 & 14.14 & 18 & 20.00 & 44.00 & 99.00 & 100\\\\\n\tBonusMalus & 678007 & 59.76 & 15.62 & 50 & 50.00 & 50.00 & 158.00 & 158\\\\\n\tDensity & 678007 & 1792.43 & 3958.66 & 1 & 10.00 & 393.00 & 27000.00 & 27000\\\\\n\tClaimTotal & 678007 & 88.17 & 5822.07 & 0 & 0.00 & 0.00 & 65429.46 & 4075401\\\\\n\tClaimNb & 678007 & 0.04 & 0.20 & 0 & 0.00 & 0.00 & 3.00 & 5\\\\\n\\end{tabular}\n","text/plain":[" count mean sd min 1%.1% 50%.50% 99.99%.99.99%\n","IDpol 678007 2621857.33 1641789.39 1 19934.12 2272153.00 6114262.20 \n","Exposure 678007 0.53 0.36 0 0.01 0.49 1.00 \n","VehPower 678007 6.45 2.05 4 4.00 6.00 15.00 \n","VehAge 678007 7.04 5.63 0 0.00 6.00 69.00 \n","DrivAge 678007 45.50 14.14 18 20.00 44.00 99.00 \n","BonusMalus 678007 59.76 15.62 50 50.00 50.00 158.00 \n","Density 678007 1792.43 3958.66 1 10.00 393.00 27000.00 \n","ClaimTotal 678007 88.17 5822.07 0 0.00 0.00 65429.46 \n","ClaimNb 678007 0.04 0.20 0 0.00 0.00 3.00 \n"," max \n","IDpol 6114330\n","Exposure 1\n","VehPower 15\n","VehAge 69\n","DrivAge 100\n","BonusMalus 158\n","Density 27000\n","ClaimTotal 4075401\n","ClaimNb 5"]},"metadata":{}}]},{"cell_type":"markdown","source":["We now summarize the target variable 'ClaimsNb' in a bit more detail.\n","\n","We can get some quick summaries with the 'table' function in 'R'.\n","\n"],"metadata":{"id":"QR57EyC6HbL8"}},{"cell_type":"code","source":["table(dat_frqMod$ClaimNb)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":70},"id":"q8H03qQcGd7p","executionInfo":{"status":"ok","timestamp":1749317971162,"user_tz":300,"elapsed":72,"user":{"displayName":"EDWARD FREES","userId":"04339356503797759391"}},"outputId":"16b48932-262d-428c-99be-a1135f20a02e"},"execution_count":40,"outputs":[{"output_type":"display_data","data":{"text/plain":["\n"," 0 1 2 3 4 5 \n","653069 23571 1298 62 5 2 "]},"metadata":{}}]},{"cell_type":"code","source":["table(dat_frqMod$VehBrand)\n","table(dat_frqMod$VehGas)\n","table(dat_frqMod$Region)\n","table(dat_frqMod$Area)\n","table(dat_frqMod$VehPower)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":353},"id":"NAQ8WmZ2Oguk","executionInfo":{"status":"ok","timestamp":1749317974264,"user_tz":300,"elapsed":423,"user":{"displayName":"EDWARD FREES","userId":"04339356503797759391"}},"outputId":"e3496c3f-6386-4601-ff99-e636b010a766"},"execution_count":41,"outputs":[{"output_type":"display_data","data":{"text/plain":["\n"," B1 B10 B11 B12 B13 B14 B2 B3 B4 B5 B6 \n","162730 17707 13585 166024 12178 4047 159861 53395 25179 34753 28548 "]},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["\n"," Diesel Regular \n"," 332136 345871 "]},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["\n"," R11 R21 R22 R23 R24 R25 R26 R31 R41 R42 R43 \n"," 69791 3026 7994 8784 160601 10893 10492 27285 12990 2200 1326 \n"," R52 R53 R54 R72 R73 R74 R82 R83 R91 R93 R94 \n"," 38751 42122 19046 31329 17141 4567 84752 5287 35799 79315 4516 "]},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["\n"," A B C D E F \n","103957 75459 191880 151590 137167 17954 "]},"metadata":{}},{"output_type":"display_data","data":{"text/plain":["\n"," 4 5 6 7 8 9 10 11 12 13 14 \n","115343 124821 148976 145401 46956 30085 31354 18352 8214 3229 2350 \n"," 15 \n"," 2926 "]},"metadata":{}}]},{"cell_type":"code","source":["table(dat_frqMod$VehGas, dat_frqMod$ClaimNb)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":88},"id":"h5qgFGoVPdWe","executionInfo":{"status":"ok","timestamp":1749317978167,"user_tz":300,"elapsed":169,"user":{"displayName":"EDWARD FREES","userId":"04339356503797759391"}},"outputId":"439d79b8-0c20-42dd-8556-46e3632ea47e"},"execution_count":42,"outputs":[{"output_type":"display_data","data":{"text/plain":[" \n"," 0 1 2 3 4 5\n"," Diesel 319436 11988 678 31 2 1\n"," Regular 333633 11583 620 31 3 1"]},"metadata":{}}]},{"cell_type":"markdown","source":["However, for more complex summarizes, we utilize a library \"Hmisc\". These needs to be first **installed**, which may take a couple of minutes in Google Colab."],"metadata":{"id":"-Xjl2iHFHmP-"}},{"cell_type":"code","source":["#install.packages(\"Hmisc\")\n","#library(Hmisc)\n","Hmisc::summarize(dat_frqMod$DrivAge, dat_frqMod$ClaimNb, mean)\n","Hmisc::summarize(dat_frqMod$VehAge, dat_frqMod$ClaimNb, mean)"],"metadata":{"id":"zIlhw8pQHROm","colab":{"base_uri":"https://localhost:8080/","height":555},"executionInfo":{"status":"ok","timestamp":1749317983290,"user_tz":300,"elapsed":399,"user":{"displayName":"EDWARD FREES","userId":"04339356503797759391"}},"outputId":"f1b05068-fcbb-4915-cc99-2534b96a8719"},"execution_count":43,"outputs":[{"output_type":"display_data","data":{"text/html":["\n","\n","\n","\t\n","\t\n","\n","\n","\t\n","\t\n","\t\n","\t\n","\t\n","\t\n","\n","
A data.frame: 6 × 2
dat_frqMod$ClaimNbdat_frqMod$DrivAge
<labelled><labelled[1d]>
1045.51285
2145.17216
3244.60169
4343.33871
5440.20000
6559.50000
\n"],"text/markdown":"\nA data.frame: 6 × 2\n\n| | dat_frqMod$ClaimNb <labelled> | dat_frqMod$DrivAge <labelled[1d]> |\n|---|---|---|\n| 1 | 0 | 45.51285 |\n| 2 | 1 | 45.17216 |\n| 3 | 2 | 44.60169 |\n| 4 | 3 | 43.33871 |\n| 5 | 4 | 40.20000 |\n| 6 | 5 | 59.50000 |\n\n","text/latex":"A data.frame: 6 × 2\n\\begin{tabular}{r|ll}\n & dat\\_frqMod\\$ClaimNb & dat\\_frqMod\\$DrivAge\\\\\n & & \\\\\n\\hline\n\t1 & 0 & 45.51285\\\\\n\t2 & 1 & 45.17216\\\\\n\t3 & 2 & 44.60169\\\\\n\t4 & 3 & 43.33871\\\\\n\t5 & 4 & 40.20000\\\\\n\t6 & 5 & 59.50000\\\\\n\\end{tabular}\n","text/plain":[" dat_frqMod$ClaimNb dat_frqMod$DrivAge\n","1 0 45.51285 \n","2 1 45.17216 \n","3 2 44.60169 \n","4 3 43.33871 \n","5 4 40.20000 \n","6 5 59.50000 "]},"metadata":{}},{"output_type":"display_data","data":{"text/html":["\n","\n","\n","\t\n","\t\n","\n","\n","\t\n","\t\n","\t\n","\t\n","\t\n","\t\n","\n","
A data.frame: 6 × 2
dat_frqMod$ClaimNbdat_frqMod$VehAge
<labelled><labelled[1d]>
10 7.028766
21 7.426287
32 6.659476
43 6.306452
54 4.400000
6510.500000
\n"],"text/markdown":"\nA data.frame: 6 × 2\n\n| | dat_frqMod$ClaimNb <labelled> | dat_frqMod$VehAge <labelled[1d]> |\n|---|---|---|\n| 1 | 0 | 7.028766 |\n| 2 | 1 | 7.426287 |\n| 3 | 2 | 6.659476 |\n| 4 | 3 | 6.306452 |\n| 5 | 4 | 4.400000 |\n| 6 | 5 | 10.500000 |\n\n","text/latex":"A data.frame: 6 × 2\n\\begin{tabular}{r|ll}\n & dat\\_frqMod\\$ClaimNb & dat\\_frqMod\\$VehAge\\\\\n & & \\\\\n\\hline\n\t1 & 0 & 7.028766\\\\\n\t2 & 1 & 7.426287\\\\\n\t3 & 2 & 6.659476\\\\\n\t4 & 3 & 6.306452\\\\\n\t5 & 4 & 4.400000\\\\\n\t6 & 5 & 10.500000\\\\\n\\end{tabular}\n","text/plain":[" dat_frqMod$ClaimNb dat_frqMod$VehAge\n","1 0 7.028766 \n","2 1 7.426287 \n","3 2 6.659476 \n","4 3 6.306452 \n","5 4 4.400000 \n","6 5 10.500000 "]},"metadata":{}}]}]}