--- title: "Session 8 - Categorical data Analyisis. Exercises" output: html_notebook: code_folding: show --- # Exercise 1 - Select `diabetes.xls` datasets - Read the dataset into R and check that the categorical variables you are interested (mort, tabac, ecg) in are converted into factors. - Confirm the conversion by summarizing the variables ```{r} library(readxl) diabetes <- read_excel("datasets/diabetes.xls") str(diabetes) diabetes$mort <- as.factor(diabetes$mort) diabetes$tabac <- as.factor(diabetes$tabac) diabetes$ecg <- as.factor(diabetes$ecg) ``` # Exercise 2 - With the diabetes dataset repeat the crosstabulation done above using - Two categorical variables - Variable "mort" and a newly created variable "bmi30" created by properly categorizing variable bmi. ```{r} library(dplyr) diabetes <- diabetes %>% mutate(bmi30 = bmi >30 ) mt <- addmargins(table(diabetes$mort, diabetes$bmi30)) mt prop.table(mt) ``` # Exercise 3 - In the diabetes dataset. - Test the hypothesis that the proportion of patients with `bmi30` is higher than 40% - In the global population of the study - Only in patients with 'mort' equal "Muerto" ```{r} prop.test(table(diabetes$bmi30), p = 0.4) diab_mort <- diabetes %>% filter(mort == "Muerto") prop.test(table(diab_mort$bmi30), p = 0.4) ```