''' Data: Keggle, insurance.csv Exploratory Data Analysis: #outputting the data for preliminary analysis #viewing the first 10 rows print(data.head(n=10)) #checking the data types print(data.dtypes) age int64 sex object bmi float64 children int64 smoker object region object charges float64 Based on this information, the variables 'sex', 'smoker', and 'region' need changed to numerical values instead of 'object' or 'string'. Region needs to be grouped e.g. 1 = southeast, 2 = soutwest, 3 = ... The 'charges' variable needs to be rounded in order for it to be better to come up with a prediction. #checking for na in the data #print(data.isna().sum()) #print(data.isnull().sum()) ''' #importing libraries pandas and numpy import pandas as pd #reading the csv file data = pd.read_csv('insurance.csv', sep=',') #converting data into a dataframe data = pd.DataFrame(data = data) #Label Encoding the data (sex, smoker, and region variables) object_df = data.select_dtypes(include=['object']).copy() #print(object_df.head()) #changing variables to 'category' type object_df["sex"] = object_df["sex"].astype('category') object_df["smoker"] = object_df["smoker"].astype('category') object_df["region"] = object_df["region"].astype('category') #assinging encoded variables using 'cat.codes' object_df["sex_binary"] = object_df["sex"].cat.codes object_df["smoker_binary"] = object_df["smoker"].cat.codes object_df["region_encoded"] = object_df["region"].cat.codes #assigning colums to the data object data["sex"] = object_df["sex_binary"] data["smoker"] = object_df["smoker_binary"] data["region"] = object_df["region_encoded"] #checking the data object #print(data.head(n=10)) ''' Sex variable 0 female 1 male Smoker variable 1 yes 0 no Region 3 Southwest 2 Southeast 1 Northwest 0 Northeast '''