{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "TZ-PuwoyX4qQ" }, "source": [ "# Week 1 Assignment: Data Validation\n", "\n", "[Tensorflow Data Validation (TFDV)](https://cloud.google.com/solutions/machine-learning/analyzing-and-validating-data-at-scale-for-ml-using-tfx) is an open-source library that helps to understand, validate, and monitor production machine learning (ML) data at scale. Common use-cases include comparing training, evaluation and serving datasets, as well as checking for training/serving skew. You have seen the core functionalities of this package in the previous ungraded lab and you will get to practice them in this week's assignment.\n", "\n", "In this lab, you will use TFDV in order to:\n", "\n", "* Generate and visualize statistics from a dataframe\n", "* Infer a dataset schema\n", "* Calculate, visualize and fix anomalies\n", "\n", "Let's begin!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of Contents\n", "\n", "- [1 - Setup and Imports](#1)\n", "- [2 - Load the Dataset](#2)\n", " - [2.1 - Read and Split the Dataset](#2-1)\n", " - [2.1.1 - Data Splits](#2-1-1)\n", " - [2.1.2 - Label Column](#2-1-2)\n", "- [3 - Generate and Visualize Training Data Statistics](#3)\n", " - [3.1 - Removing Irrelevant Features](#3-1)\n", " - [Exercise 1 - Generate Training Statistics](#ex-1)\n", " - [Exercise 2 - Visualize Training Statistics](#ex-2)\n", "- [4 - Infer a Data Schema](#4)\n", " - [Exercise 3: Infer the training set schema](#ex-3)\n", "- [5 - Calculate, Visualize and Fix Evaluation Anomalies](#5)\n", " - [Exercise 4: Compare Training and Evaluation Statistics](#ex-4)\n", " - [Exercise 5: Detecting Anomalies](#ex-5)\n", " - [Exercise 6: Fix evaluation anomalies in the schema](#ex-6)\n", "- [6 - Schema Environments](#6)\n", " - [Exercise 7: Check anomalies in the serving set](#ex-7)\n", " - [Exercise 8: Modifying the domain](#ex-8)\n", " - [Exercise 9: Detecting anomalies with environments](#ex-9)\n", "- [7 - Check for Data Drift and Skew](#7)\n", "- [8 - Display Stats for Data Slices](#8)\n", "- [9 - Freeze the Schema](#8)" ] }, { "cell_type": "markdown", "metadata": { "id": "ZEnMK4DRNV1O" }, "source": [ "\n", "## 1 - Setup and Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "zrLPRsQgImel" }, "outputs": [], "source": [ "# Import packages\n", "import os\n", "import pandas as pd\n", "import tensorflow as tf\n", "import tempfile, urllib, zipfile\n", "import tensorflow_data_validation as tfdv\n", "\n", "\n", "from tensorflow.python.lib.io import file_io\n", "from tensorflow_data_validation.utils import slicing_util\n", "from tensorflow_metadata.proto.v0.statistics_pb2 import DatasetFeatureStatisticsList, DatasetFeatureStatistics\n", "\n", "# Set TF's logger to only display errors to avoid internal warnings being shown\n", "tf.get_logger().setLevel('ERROR')" ] }, { "cell_type": "markdown", "metadata": { "id": "5MizoHg1DRlK" }, "source": [ "\n", "## 2 - Load the Dataset\n", "You will be using the [Diabetes 130-US hospitals for years 1999-2008 Data Set](https://archive.ics.uci.edu/ml/datasets/diabetes+130-us+hospitals+for+years+1999-2008) donated to the University of California, Irvine (UCI) Machine Learning Repository. The dataset represents 10 years (1999-2008) of clinical care at 130 US hospitals and integrated delivery networks. It includes over 50 features representing patient and hospital outcomes.\n", "\n", "This dataset has already been included in your Jupyter workspace so you can easily load it." ] }, { "cell_type": "markdown", "metadata": { "id": "S2o2NGqIxc5e" }, "source": [ "\n", "### 2.1 Read and Split the Dataset" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "YyO3RSuLF0Nf" }, "outputs": [ { "data": { "text/html": [ "
\n", " | encounter_id | \n", "patient_nbr | \n", "race | \n", "gender | \n", "age | \n", "weight | \n", "admission_type_id | \n", "discharge_disposition_id | \n", "admission_source_id | \n", "time_in_hospital | \n", "... | \n", "citoglipton | \n", "insulin | \n", "glyburide-metformin | \n", "glipizide-metformin | \n", "glimepiride-pioglitazone | \n", "metformin-rosiglitazone | \n", "metformin-pioglitazone | \n", "change | \n", "diabetesMed | \n", "readmitted | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2278392 | \n", "8222157 | \n", "Caucasian | \n", "Female | \n", "[0-10) | \n", "NaN | \n", "6 | \n", "25 | \n", "1 | \n", "1 | \n", "... | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "NO | \n", "
1 | \n", "149190 | \n", "55629189 | \n", "Caucasian | \n", "Female | \n", "[10-20) | \n", "NaN | \n", "1 | \n", "1 | \n", "7 | \n", "3 | \n", "... | \n", "No | \n", "Up | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "Ch | \n", "Yes | \n", ">30 | \n", "
2 | \n", "64410 | \n", "86047875 | \n", "AfricanAmerican | \n", "Female | \n", "[20-30) | \n", "NaN | \n", "1 | \n", "1 | \n", "7 | \n", "2 | \n", "... | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "Yes | \n", "NO | \n", "
3 | \n", "500364 | \n", "82442376 | \n", "Caucasian | \n", "Male | \n", "[30-40) | \n", "NaN | \n", "1 | \n", "1 | \n", "7 | \n", "2 | \n", "... | \n", "No | \n", "Up | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "Ch | \n", "Yes | \n", "NO | \n", "
4 | \n", "16680 | \n", "42519267 | \n", "Caucasian | \n", "Male | \n", "[40-50) | \n", "NaN | \n", "1 | \n", "1 | \n", "7 | \n", "1 | \n", "... | \n", "No | \n", "Steady | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "Ch | \n", "Yes | \n", "NO | \n", "
5 rows × 50 columns
\n", "\n", " | Type | \n", "Presence | \n", "Valency | \n", "Domain | \n", "
---|---|---|---|---|
Feature name | \n", "\n", " | \n", " | \n", " | \n", " |
'race' | \n", "STRING | \n", "optional | \n", "single | \n", "'race' | \n", "
'gender' | \n", "STRING | \n", "required | \n", "\n", " | 'gender' | \n", "
'age' | \n", "STRING | \n", "required | \n", "\n", " | 'age' | \n", "
'weight' | \n", "STRING | \n", "optional | \n", "single | \n", "'weight' | \n", "
'admission_type_id' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'discharge_disposition_id' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'admission_source_id' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'time_in_hospital' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'payer_code' | \n", "STRING | \n", "optional | \n", "single | \n", "'payer_code' | \n", "
'medical_specialty' | \n", "STRING | \n", "optional | \n", "single | \n", "'medical_specialty' | \n", "
'num_lab_procedures' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'num_procedures' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'num_medications' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'number_outpatient' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'number_emergency' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'number_inpatient' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'diag_1' | \n", "BYTES | \n", "optional | \n", "single | \n", "- | \n", "
'diag_2' | \n", "BYTES | \n", "optional | \n", "single | \n", "- | \n", "
'diag_3' | \n", "BYTES | \n", "optional | \n", "single | \n", "- | \n", "
'number_diagnoses' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'max_glu_serum' | \n", "STRING | \n", "required | \n", "\n", " | 'max_glu_serum' | \n", "
'A1Cresult' | \n", "STRING | \n", "required | \n", "\n", " | 'A1Cresult' | \n", "
'metformin' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'repaglinide' | \n", "STRING | \n", "required | \n", "\n", " | 'repaglinide' | \n", "
'nateglinide' | \n", "STRING | \n", "required | \n", "\n", " | 'nateglinide' | \n", "
'chlorpropamide' | \n", "STRING | \n", "required | \n", "\n", " | 'chlorpropamide' | \n", "
'glimepiride' | \n", "STRING | \n", "required | \n", "\n", " | 'glimepiride' | \n", "
'acetohexamide' | \n", "STRING | \n", "required | \n", "\n", " | 'acetohexamide' | \n", "
'glipizide' | \n", "STRING | \n", "required | \n", "\n", " | 'glipizide' | \n", "
'glyburide' | \n", "STRING | \n", "required | \n", "\n", " | 'glyburide' | \n", "
'tolbutamide' | \n", "STRING | \n", "required | \n", "\n", " | 'tolbutamide' | \n", "
'pioglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'pioglitazone' | \n", "
'rosiglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'rosiglitazone' | \n", "
'acarbose' | \n", "STRING | \n", "required | \n", "\n", " | 'acarbose' | \n", "
'miglitol' | \n", "STRING | \n", "required | \n", "\n", " | 'miglitol' | \n", "
'troglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'troglitazone' | \n", "
'tolazamide' | \n", "STRING | \n", "required | \n", "\n", " | 'tolazamide' | \n", "
'examide' | \n", "STRING | \n", "required | \n", "\n", " | 'examide' | \n", "
'citoglipton' | \n", "STRING | \n", "required | \n", "\n", " | 'citoglipton' | \n", "
'insulin' | \n", "STRING | \n", "required | \n", "\n", " | 'insulin' | \n", "
'glyburide-metformin' | \n", "STRING | \n", "required | \n", "\n", " | 'glyburide-metformin' | \n", "
'glipizide-metformin' | \n", "STRING | \n", "required | \n", "\n", " | 'glipizide-metformin' | \n", "
'glimepiride-pioglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'glimepiride-pioglitazone' | \n", "
'metformin-rosiglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin-rosiglitazone' | \n", "
'metformin-pioglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin-pioglitazone' | \n", "
'change' | \n", "STRING | \n", "required | \n", "\n", " | 'change' | \n", "
'diabetesMed' | \n", "STRING | \n", "required | \n", "\n", " | 'diabetesMed' | \n", "
'readmitted' | \n", "STRING | \n", "required | \n", "\n", " | 'readmitted' | \n", "
\n", " | Values | \n", "
---|---|
Domain | \n", "\n", " |
'race' | \n", "'AfricanAmerican', 'Asian', 'Caucasian', 'Hispanic', 'Other' | \n", "
'gender' | \n", "'Female', 'Male', 'Unknown/Invalid' | \n", "
'age' | \n", "'[0-10)', '[10-20)', '[20-30)', '[30-40)', '[40-50)', '[50-60)', '[60-70)', '[70-80)', '[80-90)', '[90-100)' | \n", "
'weight' | \n", "'>200', '[0-25)', '[100-125)', '[125-150)', '[150-175)', '[175-200)', '[25-50)', '[50-75)', '[75-100)' | \n", "
'payer_code' | \n", "'BC', 'CH', 'CM', 'CP', 'DM', 'HM', 'MC', 'MD', 'MP', 'OG', 'OT', 'PO', 'SI', 'SP', 'UN', 'WC' | \n", "
'medical_specialty' | \n", "'AllergyandImmunology', 'Anesthesiology', 'Anesthesiology-Pediatric', 'Cardiology', 'Cardiology-Pediatric', 'Dentistry', 'Dermatology', 'Emergency/Trauma', 'Endocrinology', 'Family/GeneralPractice', 'Gastroenterology', 'Gynecology', 'Hematology', 'Hematology/Oncology', 'Hospitalist', 'InfectiousDiseases', 'InternalMedicine', 'Nephrology', 'Neurology', 'Obsterics&Gynecology-GynecologicOnco', 'Obstetrics', 'ObstetricsandGynecology', 'Oncology', 'Ophthalmology', 'Orthopedics', 'Orthopedics-Reconstructive', 'Osteopath', 'Otolaryngology', 'OutreachServices', 'Pathology', 'Pediatrics', 'Pediatrics-AllergyandImmunology', 'Pediatrics-CriticalCare', 'Pediatrics-EmergencyMedicine', 'Pediatrics-Endocrinology', 'Pediatrics-Hematology-Oncology', 'Pediatrics-InfectiousDiseases', 'Pediatrics-Neurology', 'Pediatrics-Pulmonology', 'Perinatology', 'PhysicalMedicineandRehabilitation', 'PhysicianNotFound', 'Podiatry', 'Proctology', 'Psychiatry', 'Psychiatry-Addictive', 'Psychiatry-Child/Adolescent', 'Psychology', 'Pulmonology', 'Radiologist', 'Radiology', 'Rheumatology', 'Speech', 'SportsMedicine', 'Surgeon', 'Surgery-Cardiovascular', 'Surgery-Cardiovascular/Thoracic', 'Surgery-Colon&Rectal', 'Surgery-General', 'Surgery-Maxillofacial', 'Surgery-Neuro', 'Surgery-Pediatric', 'Surgery-Plastic', 'Surgery-PlasticwithinHeadandNeck', 'Surgery-Thoracic', 'Surgery-Vascular', 'SurgicalSpecialty', 'Urology' | \n", "
'max_glu_serum' | \n", "'>200', '>300', 'None', 'Norm' | \n", "
'A1Cresult' | \n", "'>7', '>8', 'None', 'Norm' | \n", "
'metformin' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'repaglinide' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'nateglinide' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'chlorpropamide' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'glimepiride' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'acetohexamide' | \n", "'No', 'Steady' | \n", "
'glipizide' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'glyburide' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'tolbutamide' | \n", "'No', 'Steady' | \n", "
'pioglitazone' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'rosiglitazone' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'acarbose' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'miglitol' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'troglitazone' | \n", "'No', 'Steady' | \n", "
'tolazamide' | \n", "'No', 'Steady', 'Up' | \n", "
'examide' | \n", "'No' | \n", "
'citoglipton' | \n", "'No' | \n", "
'insulin' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'glyburide-metformin' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'glipizide-metformin' | \n", "'No', 'Steady' | \n", "
'glimepiride-pioglitazone' | \n", "'No' | \n", "
'metformin-rosiglitazone' | \n", "'No' | \n", "
'metformin-pioglitazone' | \n", "'No' | \n", "
'change' | \n", "'Ch', 'No' | \n", "
'diabetesMed' | \n", "'No', 'Yes' | \n", "
'readmitted' | \n", "'<30', '>30', 'NO' | \n", "
\n", " | Anomaly short description | \n", "Anomaly long description | \n", "
---|---|---|
Feature name | \n", "\n", " | \n", " |
'glimepiride-pioglitazone' | \n", "Unexpected string values | \n", "Examples contain values missing from the schema: Steady (<1%). | \n", "
'medical_specialty' | \n", "Unexpected string values | \n", "Examples contain values missing from the schema: Neurophysiology (<1%). | \n", "
\n", " | Anomaly short description | \n", "Anomaly long description | \n", "
---|---|---|
Feature name | \n", "\n", " | \n", " |
'metformin-pioglitazone' | \n", "Unexpected string values | \n", "Examples contain values missing from the schema: Steady (<1%). | \n", "
'payer_code' | \n", "Unexpected string values | \n", "Examples contain values missing from the schema: FR (<1%). | \n", "
'medical_specialty' | \n", "Unexpected string values | \n", "Examples contain values missing from the schema: DCPTEAM (<1%), Endocrinology-Metabolism (<1%), Resident (<1%). | \n", "
'metformin-rosiglitazone' | \n", "Unexpected string values | \n", "Examples contain values missing from the schema: Steady (<1%). | \n", "
'readmitted' | \n", "Column dropped | \n", "Column is completely missing | \n", "
\n", " | Anomaly short description | \n", "Anomaly long description | \n", "
---|---|---|
Feature name | \n", "\n", " | \n", " |
'metformin-pioglitazone' | \n", "Unexpected string values | \n", "Examples contain values missing from the schema: Steady (<1%). | \n", "
'metformin-rosiglitazone' | \n", "Unexpected string values | \n", "Examples contain values missing from the schema: Steady (<1%). | \n", "
'readmitted' | \n", "Column dropped | \n", "Column is completely missing | \n", "
\n", " | Type | \n", "Presence | \n", "Valency | \n", "Domain | \n", "
---|---|---|---|---|
Feature name | \n", "\n", " | \n", " | \n", " | \n", " |
'race' | \n", "STRING | \n", "optional | \n", "single | \n", "'race' | \n", "
'gender' | \n", "STRING | \n", "required | \n", "\n", " | 'gender' | \n", "
'age' | \n", "STRING | \n", "required | \n", "\n", " | 'age' | \n", "
'weight' | \n", "STRING | \n", "optional | \n", "single | \n", "'weight' | \n", "
'admission_type_id' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'discharge_disposition_id' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'admission_source_id' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'time_in_hospital' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'payer_code' | \n", "STRING | \n", "optional | \n", "single | \n", "'payer_code' | \n", "
'medical_specialty' | \n", "STRING | \n", "optional | \n", "single | \n", "'medical_specialty' | \n", "
'num_lab_procedures' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'num_procedures' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'num_medications' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'number_outpatient' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'number_emergency' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'number_inpatient' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'diag_1' | \n", "BYTES | \n", "optional | \n", "single | \n", "- | \n", "
'diag_2' | \n", "BYTES | \n", "optional | \n", "single | \n", "- | \n", "
'diag_3' | \n", "BYTES | \n", "optional | \n", "single | \n", "- | \n", "
'number_diagnoses' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'max_glu_serum' | \n", "STRING | \n", "required | \n", "\n", " | 'max_glu_serum' | \n", "
'A1Cresult' | \n", "STRING | \n", "required | \n", "\n", " | 'A1Cresult' | \n", "
'metformin' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'repaglinide' | \n", "STRING | \n", "required | \n", "\n", " | 'repaglinide' | \n", "
'nateglinide' | \n", "STRING | \n", "required | \n", "\n", " | 'nateglinide' | \n", "
'chlorpropamide' | \n", "STRING | \n", "required | \n", "\n", " | 'chlorpropamide' | \n", "
'glimepiride' | \n", "STRING | \n", "required | \n", "\n", " | 'glimepiride' | \n", "
'acetohexamide' | \n", "STRING | \n", "required | \n", "\n", " | 'acetohexamide' | \n", "
'glipizide' | \n", "STRING | \n", "required | \n", "\n", " | 'glipizide' | \n", "
'glyburide' | \n", "STRING | \n", "required | \n", "\n", " | 'glyburide' | \n", "
'tolbutamide' | \n", "STRING | \n", "required | \n", "\n", " | 'tolbutamide' | \n", "
'pioglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'pioglitazone' | \n", "
'rosiglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'rosiglitazone' | \n", "
'acarbose' | \n", "STRING | \n", "required | \n", "\n", " | 'acarbose' | \n", "
'miglitol' | \n", "STRING | \n", "required | \n", "\n", " | 'miglitol' | \n", "
'troglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'troglitazone' | \n", "
'tolazamide' | \n", "STRING | \n", "required | \n", "\n", " | 'tolazamide' | \n", "
'examide' | \n", "STRING | \n", "required | \n", "\n", " | 'examide' | \n", "
'citoglipton' | \n", "STRING | \n", "required | \n", "\n", " | 'citoglipton' | \n", "
'insulin' | \n", "STRING | \n", "required | \n", "\n", " | 'insulin' | \n", "
'glyburide-metformin' | \n", "STRING | \n", "required | \n", "\n", " | 'glyburide-metformin' | \n", "
'glipizide-metformin' | \n", "STRING | \n", "required | \n", "\n", " | 'glipizide-metformin' | \n", "
'glimepiride-pioglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'glimepiride-pioglitazone' | \n", "
'metformin-rosiglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin-rosiglitazone' | \n", "
'metformin-pioglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin-pioglitazone' | \n", "
'change' | \n", "STRING | \n", "required | \n", "\n", " | 'change' | \n", "
'diabetesMed' | \n", "STRING | \n", "required | \n", "\n", " | 'diabetesMed' | \n", "
'readmitted' | \n", "STRING | \n", "required | \n", "\n", " | 'readmitted' | \n", "
\n", " | Values | \n", "
---|---|
Domain | \n", "\n", " |
'race' | \n", "'AfricanAmerican', 'Asian', 'Caucasian', 'Hispanic', 'Other' | \n", "
'gender' | \n", "'Female', 'Male', 'Unknown/Invalid' | \n", "
'age' | \n", "'[0-10)', '[10-20)', '[20-30)', '[30-40)', '[40-50)', '[50-60)', '[60-70)', '[70-80)', '[80-90)', '[90-100)' | \n", "
'weight' | \n", "'>200', '[0-25)', '[100-125)', '[125-150)', '[150-175)', '[175-200)', '[25-50)', '[50-75)', '[75-100)' | \n", "
'payer_code' | \n", "'BC', 'CH', 'CM', 'CP', 'DM', 'HM', 'MC', 'MD', 'MP', 'OG', 'OT', 'PO', 'SI', 'SP', 'UN', 'WC' | \n", "
'medical_specialty' | \n", "'AllergyandImmunology', 'Anesthesiology', 'Anesthesiology-Pediatric', 'Cardiology', 'Cardiology-Pediatric', 'Dentistry', 'Dermatology', 'Emergency/Trauma', 'Endocrinology', 'Family/GeneralPractice', 'Gastroenterology', 'Gynecology', 'Hematology', 'Hematology/Oncology', 'Hospitalist', 'InfectiousDiseases', 'InternalMedicine', 'Nephrology', 'Neurology', 'Obsterics&Gynecology-GynecologicOnco', 'Obstetrics', 'ObstetricsandGynecology', 'Oncology', 'Ophthalmology', 'Orthopedics', 'Orthopedics-Reconstructive', 'Osteopath', 'Otolaryngology', 'OutreachServices', 'Pathology', 'Pediatrics', 'Pediatrics-AllergyandImmunology', 'Pediatrics-CriticalCare', 'Pediatrics-EmergencyMedicine', 'Pediatrics-Endocrinology', 'Pediatrics-Hematology-Oncology', 'Pediatrics-InfectiousDiseases', 'Pediatrics-Neurology', 'Pediatrics-Pulmonology', 'Perinatology', 'PhysicalMedicineandRehabilitation', 'PhysicianNotFound', 'Podiatry', 'Proctology', 'Psychiatry', 'Psychiatry-Addictive', 'Psychiatry-Child/Adolescent', 'Psychology', 'Pulmonology', 'Radiologist', 'Radiology', 'Rheumatology', 'Speech', 'SportsMedicine', 'Surgeon', 'Surgery-Cardiovascular', 'Surgery-Cardiovascular/Thoracic', 'Surgery-Colon&Rectal', 'Surgery-General', 'Surgery-Maxillofacial', 'Surgery-Neuro', 'Surgery-Pediatric', 'Surgery-Plastic', 'Surgery-PlasticwithinHeadandNeck', 'Surgery-Thoracic', 'Surgery-Vascular', 'SurgicalSpecialty', 'Urology', 'Neurophysiology' | \n", "
'max_glu_serum' | \n", "'>200', '>300', 'None', 'Norm' | \n", "
'A1Cresult' | \n", "'>7', '>8', 'None', 'Norm' | \n", "
'metformin' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'repaglinide' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'nateglinide' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'chlorpropamide' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'glimepiride' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'acetohexamide' | \n", "'No', 'Steady' | \n", "
'glipizide' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'glyburide' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'tolbutamide' | \n", "'No', 'Steady' | \n", "
'pioglitazone' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'rosiglitazone' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'acarbose' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'miglitol' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'troglitazone' | \n", "'No', 'Steady' | \n", "
'tolazamide' | \n", "'No', 'Steady', 'Up' | \n", "
'examide' | \n", "'No' | \n", "
'citoglipton' | \n", "'No' | \n", "
'insulin' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'glyburide-metformin' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'glipizide-metformin' | \n", "'No', 'Steady' | \n", "
'glimepiride-pioglitazone' | \n", "'No', 'Steady' | \n", "
'metformin-rosiglitazone' | \n", "'No' | \n", "
'metformin-pioglitazone' | \n", "'No' | \n", "
'change' | \n", "'Ch', 'No' | \n", "
'diabetesMed' | \n", "'No', 'Yes' | \n", "
'readmitted' | \n", "'<30', '>30', 'NO' | \n", "
\n", " | Type | \n", "Presence | \n", "Valency | \n", "Domain | \n", "
---|---|---|---|---|
Feature name | \n", "\n", " | \n", " | \n", " | \n", " |
'race' | \n", "STRING | \n", "optional | \n", "single | \n", "'race' | \n", "
'gender' | \n", "STRING | \n", "required | \n", "\n", " | 'gender' | \n", "
'age' | \n", "STRING | \n", "required | \n", "\n", " | 'age' | \n", "
'weight' | \n", "STRING | \n", "optional | \n", "single | \n", "'weight' | \n", "
'admission_type_id' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'discharge_disposition_id' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'admission_source_id' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'time_in_hospital' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'payer_code' | \n", "STRING | \n", "optional | \n", "single | \n", "'payer_code' | \n", "
'medical_specialty' | \n", "STRING | \n", "optional | \n", "single | \n", "'medical_specialty' | \n", "
'num_lab_procedures' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'num_procedures' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'num_medications' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'number_outpatient' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'number_emergency' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'number_inpatient' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'diag_1' | \n", "BYTES | \n", "optional | \n", "single | \n", "- | \n", "
'diag_2' | \n", "BYTES | \n", "optional | \n", "single | \n", "- | \n", "
'diag_3' | \n", "BYTES | \n", "optional | \n", "single | \n", "- | \n", "
'number_diagnoses' | \n", "INT | \n", "required | \n", "\n", " | - | \n", "
'max_glu_serum' | \n", "STRING | \n", "required | \n", "\n", " | 'max_glu_serum' | \n", "
'A1Cresult' | \n", "STRING | \n", "required | \n", "\n", " | 'A1Cresult' | \n", "
'metformin' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'repaglinide' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'nateglinide' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'chlorpropamide' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'glimepiride' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'acetohexamide' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'glipizide' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'glyburide' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'tolbutamide' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'pioglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'rosiglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'acarbose' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'miglitol' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'troglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'tolazamide' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'examide' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'citoglipton' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'insulin' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'glyburide-metformin' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'glipizide-metformin' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'glimepiride-pioglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'metformin-rosiglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'metformin-pioglitazone' | \n", "STRING | \n", "required | \n", "\n", " | 'metformin' | \n", "
'change' | \n", "STRING | \n", "required | \n", "\n", " | 'change' | \n", "
'diabetesMed' | \n", "STRING | \n", "required | \n", "\n", " | 'diabetesMed' | \n", "
'readmitted' | \n", "STRING | \n", "required | \n", "\n", " | 'readmitted' | \n", "
\n", " | Values | \n", "
---|---|
Domain | \n", "\n", " |
'race' | \n", "'AfricanAmerican', 'Asian', 'Caucasian', 'Hispanic', 'Other' | \n", "
'gender' | \n", "'Female', 'Male', 'Unknown/Invalid' | \n", "
'age' | \n", "'[0-10)', '[10-20)', '[20-30)', '[30-40)', '[40-50)', '[50-60)', '[60-70)', '[70-80)', '[80-90)', '[90-100)' | \n", "
'weight' | \n", "'>200', '[0-25)', '[100-125)', '[125-150)', '[150-175)', '[175-200)', '[25-50)', '[50-75)', '[75-100)' | \n", "
'payer_code' | \n", "'BC', 'CH', 'CM', 'CP', 'DM', 'HM', 'MC', 'MD', 'MP', 'OG', 'OT', 'PO', 'SI', 'SP', 'UN', 'WC' | \n", "
'medical_specialty' | \n", "'AllergyandImmunology', 'Anesthesiology', 'Anesthesiology-Pediatric', 'Cardiology', 'Cardiology-Pediatric', 'Dentistry', 'Dermatology', 'Emergency/Trauma', 'Endocrinology', 'Family/GeneralPractice', 'Gastroenterology', 'Gynecology', 'Hematology', 'Hematology/Oncology', 'Hospitalist', 'InfectiousDiseases', 'InternalMedicine', 'Nephrology', 'Neurology', 'Obsterics&Gynecology-GynecologicOnco', 'Obstetrics', 'ObstetricsandGynecology', 'Oncology', 'Ophthalmology', 'Orthopedics', 'Orthopedics-Reconstructive', 'Osteopath', 'Otolaryngology', 'OutreachServices', 'Pathology', 'Pediatrics', 'Pediatrics-AllergyandImmunology', 'Pediatrics-CriticalCare', 'Pediatrics-EmergencyMedicine', 'Pediatrics-Endocrinology', 'Pediatrics-Hematology-Oncology', 'Pediatrics-InfectiousDiseases', 'Pediatrics-Neurology', 'Pediatrics-Pulmonology', 'Perinatology', 'PhysicalMedicineandRehabilitation', 'PhysicianNotFound', 'Podiatry', 'Proctology', 'Psychiatry', 'Psychiatry-Addictive', 'Psychiatry-Child/Adolescent', 'Psychology', 'Pulmonology', 'Radiologist', 'Radiology', 'Rheumatology', 'Speech', 'SportsMedicine', 'Surgeon', 'Surgery-Cardiovascular', 'Surgery-Cardiovascular/Thoracic', 'Surgery-Colon&Rectal', 'Surgery-General', 'Surgery-Maxillofacial', 'Surgery-Neuro', 'Surgery-Pediatric', 'Surgery-Plastic', 'Surgery-PlasticwithinHeadandNeck', 'Surgery-Thoracic', 'Surgery-Vascular', 'SurgicalSpecialty', 'Urology', 'Neurophysiology' | \n", "
'max_glu_serum' | \n", "'>200', '>300', 'None', 'Norm' | \n", "
'A1Cresult' | \n", "'>7', '>8', 'None', 'Norm' | \n", "
'metformin' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'repaglinide' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'nateglinide' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'chlorpropamide' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'glimepiride' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'acetohexamide' | \n", "'No', 'Steady' | \n", "
'glipizide' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'glyburide' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'tolbutamide' | \n", "'No', 'Steady' | \n", "
'pioglitazone' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'rosiglitazone' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'acarbose' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'miglitol' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'troglitazone' | \n", "'No', 'Steady' | \n", "
'tolazamide' | \n", "'No', 'Steady', 'Up' | \n", "
'examide' | \n", "'No' | \n", "
'citoglipton' | \n", "'No' | \n", "
'insulin' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'glyburide-metformin' | \n", "'Down', 'No', 'Steady', 'Up' | \n", "
'glipizide-metformin' | \n", "'No', 'Steady' | \n", "
'glimepiride-pioglitazone' | \n", "'No', 'Steady' | \n", "
'metformin-rosiglitazone' | \n", "'No' | \n", "
'metformin-pioglitazone' | \n", "'No' | \n", "
'change' | \n", "'Ch', 'No' | \n", "
'diabetesMed' | \n", "'No', 'Yes' | \n", "
'readmitted' | \n", "'<30', '>30', 'NO' | \n", "
\n", " | Anomaly short description | \n", "Anomaly long description | \n", "
---|---|---|
Feature name | \n", "\n", " | \n", " |
'readmitted' | \n", "Column dropped | \n", "Column is completely missing | \n", "
\n", " | Anomaly short description | \n", "Anomaly long description | \n", "
---|---|---|
Feature name | \n", "\n", " | \n", " |
'diabetesMed' | \n", "High Linfty distance between training and serving | \n", "The Linfty distance between training and serving is 0.0325464 (up to six significant digits), above the threshold 0.03. The feature value with maximum difference is: No | \n", "
'payer_code' | \n", "High Linfty distance between current and previous | \n", "The Linfty distance between current and previous is 0.0342144 (up to six significant digits), above the threshold 0.03. The feature value with maximum difference is: MC | \n", "