{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Feature Selection on Dirac\n", "#### Device: Dirac-1\n", "\n", "\n", "## Introduction\n", "In machine learning problems, we often have to start with a large number of features. We need a feature selection technique that can discover a relatively small subset of the most relevant features. In what follows we present a tutorial on using QCi's technology to select a set of features by minimizing their inter-correlation. This approach can be used with any unsupervised machine learning approach such as anomaly detection and clustering algorithms. " ] }, { "cell_type": "markdown", "metadata": { "jp-MarkdownHeadingCollapsed": true }, "source": [ "## Methodology\n", "\n", "Let us have a dataset with $N$ samples and $d$ features, represented by a $N \\times d$ matrix $F$. Moreover, let us each column of $F$ by $\\bf{f}_{j}$ where $j \\in {1,...,d}$, representing all samples for feature $j$. We can choose a subset of features of size $d^\\prime$ ($d^\\prime < d$) such that the inter-correlation of the subset is minimal. We have\n", "\n", "$\\min_{\\bf{x}} \\sum_{i=1}^{d} \\sum_{j=1}^{d} C_{ij} x_{i} x_{j}$\n", "\n", "where\n", "\n", "$C_{ij} = |corr(\\bf{f_{i}}, \\bf{f_{j}})|$\n", "\n", "where $corr$ denotes a correlation function such as the Pearson correlation, and $x_{i} \\in \\{0, 1\\}$ is a binary variable indicating inclusion or exclusion of feature $i$. Obviously, $C_{ii} = 1$. The above minimization problem is subject to a constraint,\n", "\n", "$\\sum_{i=1}^{d} x_{i} = d^\\prime$\n", "\n", "We can exclude the diagonal elements of $C$ as they always add up to\n", "$d^\\prime$. In the matrix form we have,\n", "\n", "$\\min_{\\bf{x}} \\bf{{x}^{T}} (C - I) \\bf{{x}}$\n", "\n", "subject to the above constraint. Note that $I$ is a $d \\times d$ identity matrix. Note too that we have assumed that the reduced dimension $d^\\prime$ is assumed to be given in the above approach. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Medicare Prescription Data\n", "\n", "We implemented this approach using a publically available set of data on prescription of opioids in the United States. The dataset can be found at https://www.cms.gov/data-research/statistics-trends-and-reports/medicare-provider-utilization-payment-data/part-d-prescriber" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean data\n", "\n", "We start by cleaning the dataset." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "# Input \n", "INP_FILE = \"Medicare_Provider_Utilization_and_Payment_Data__Part_D_Prescriber_Summary_Table_CY2014__50001-NNN__ANON.csv\"\n", "OUT_FILE = \"cleaned_medicare_data.csv\"\n", "\n", "CON_VARS = [\n", " \"total_claim_count\",\n", " \"total_30_day_fill_count\",\n", " \"total_drug_cost\",\n", " \"total_day_supply\",\n", " \"bene_count\",\n", " \"total_claim_count_ge65\",\n", " \"total_30_day_fill_count_ge65\",\n", " \"total_drug_cost_ge65\",\n", " \"total_day_supply_ge65\",\n", " \"bene_count_ge65\",\n", " \"brand_claim_count\",\n", " \"brand_drug_cost\",\n", " \"generic_claim_count\",\n", " \"generic_drug_cost\",\n", " \"other_claim_count\",\n", " \"other_drug_cost\",\n", " \"mapd_claim_count\",\n", " \"mapd_drug_cost\",\n", " \"pdp_claim_count\",\n", " \"pdp_drug_cost\",\n", " \"lis_claim_count\",\n", " \"lis_drug_cost\",\n", " \"nonlis_claim_count\",\n", " \"nonlis_drug_cost\",\n", " \"opioid_claim_count\",\n", " \"opioid_drug_cost\",\n", " \"opioid_day_supply\",\n", " \"opioid_bene_count\",\n", " \"opioid_prescriber_rate\",\n", " \"antibiotic_claim_count\",\n", " \"antibiotic_drug_cost\",\n", " \"antibiotic_bene_count\",\n", " \"hrm_claim_count_ge65\",\n", " \"hrm_drug_cost_ge65\",\n", " \"hrm_bene_count_ge65\",\n", " \"antipsych_claim_count_ge65\",\n", " \"antipsych_drug_cost_ge65\",\n", " \"antipsych_bene_count_ge65\",\n", " \"average_age_of_beneficiaries\",\n", " \"beneficiary_age_less_65_count\",\n", " \"beneficiary_age_65_74_count\",\n", " \"beneficiary_age_75_84_count\",\n", " \"beneficiary_age_greater_84_count\",\n", " \"beneficiary_female_count\",\n", " \"beneficiary_male_count\",\n", " \"beneficiary_race_white_count\",\n", " \"beneficiary_race_black_count\",\n", " \"beneficiary_race_asian_pi_count\",\n", " \"beneficiary_race_hispanic_count\",\n", " \"beneficiary_race_nat_ind_count\",\n", " \"beneficiary_race_other_count\",\n", " \"beneficiary_nondual_count\",\n", " \"beneficiary_dual_count\",\n", " \"beneficiary_average_risk_score\",\n", "]\n", "\n", "VALID_PROVIDER_MI = [\n", " \"A\",\n", " \"M\",\n", " \"J\",\n", " \"L\",\n", " \"R\",\n", " \"S\",\n", " \"E\",\n", " \"D\",\n", " \"C\",\n", " \"B\",\n", " \"K\",\n", " \"P\",\n", " \"W\",\n", " \"H\",\n", " \"T\",\n", " \"G\",\n", " \"F\",\n", " \"N\",\n", " \"V\",\n", " \"I\",\n", " \"O\",\n", " \"Y\",\n", " \"Z\",\n", " \"U\", \n", " \"Q\",\n", " \"X\",\n", "]\n", "\n", "VALID_GEN = [\"F\", \"M\", \"Other\", \"Unknown\"]\n", "\n", "VALID_ENTITIES = [\"I\", \"O\"]\n", "\n", "VALID_DESC_FLAGS = [\"S\", \"T\"]\n", "\n", "VALID_ENROLLS = [\"E\", \"N\", \"O\"]\n", "\n", "# Some utilities \n", "def convert_to_float(x):\n", " try:\n", " return float(x)\n", " except:\n", " return None\n", " \n", "def convert_to_int(x):\n", " try:\n", " return int(float(x))\n", " except:\n", " return None\n", "\n", "# Read data \n", "df = pd.read_csv(INP_FILE, on_bad_lines = \"skip\", low_memory=False)\n", " \n", "# Clean categorical variables \n", "df[\"nppes_provider_mi\"] = df[\"nppes_provider_mi\"].fillna(\"Unknown\")\n", "df[\"nppes_provider_mi\"] = df[\"nppes_provider_mi\"].apply(\n", " lambda x: x if x in VALID_PROVIDER_MI else \"Unknown\"\n", ")\n", "\n", "df[\"nppes_credentials\"] = df[\"nppes_credentials\"].fillna(\"Unknown\")\n", "df[\"nppes_credentials\"] = df[\"nppes_credentials\"].apply(\n", " lambda x: str(x).replace(\".\", \"\")\n", ")\n", "\n", "cred_hash = {\n", " \"MEDICAL DOCTOR\": \"MD\",\n", " \"NURSE PRACTITIONER\": \"NP\",\n", "}\n", "df[\"nppes_credentials\"] = df[\"nppes_credentials\"].apply(\n", " lambda x: cred_hash[x] if x in cred_hash else x,\n", ")\n", "\n", "df[\"nppes_provider_gender\"] = df[\"nppes_provider_gender\"].fillna(\"Unknown\")\n", "df[\"nppes_provider_gender\"] = df[\"nppes_provider_gender\"].apply(\n", " lambda x: x if x in VALID_GEN else \"Other\",\n", ")\n", "\n", "df[\"nppes_entity_code\"] = df[\"nppes_entity_code\"].apply(\n", " lambda x: x if x in VALID_ENTITIES else \"Unknown\",\n", ")\n", "\n", "df[\"nppes_provider_zip5\"] = df[\"nppes_provider_zip5\"].fillna(\"Unknown\")\n", "\n", "df[\"nppes_provider_country\"] = df[\"nppes_provider_country\"].apply(\n", " lambda x: \"US\" if x == \"US\" else \"Other\",\n", ")\n", "\n", "df[\"description_flag\"] = df[\"description_flag\"].apply(\n", " lambda x: x if x in VALID_DESC_FLAGS else \"Unknown\",\n", ")\n", "\n", "df[\"medicare_prvdr_enroll_status\"] = df[\"medicare_prvdr_enroll_status\"].apply(\n", " lambda x: x if x in VALID_ENROLLS else \"Unknown\",\n", ")\n", "\n", "\n", "# Treat missing beneficiary count as it cannot be zero \n", "df[\"bene_count\"] = df[\"bene_count\"].apply(\n", " convert_to_int\n", ").fillna(-1)\n", "\n", "tmp_df = df.groupby(\n", " \"specialty_description\", as_index=False,\n", ")[\"bene_count\"].mean()\n", "\n", "bene_count_hash = dict(\n", " zip(\n", " tmp_df[\"specialty_description\"],\n", " tmp_df[\"bene_count\"],\n", " )\n", ")\n", "df[\"bene_count\"] = df.apply(\n", " lambda x: x[\"bene_count\"] if x[\n", " \"bene_count\"\n", " ] > 0 else bene_count_hash[\n", " x[\"specialty_description\"]\n", " ],\n", " axis=1,\n", ")\n", "\n", "# Treat continuous variables \n", "for item in CON_VARS:\n", " df[item] = df[item].apply(\n", " convert_to_float\n", " ).fillna(0.0)\n", "\n", "# Filter out invalid states \n", "df = df[\n", " ~df[\"nppes_provider_state\"].isin(\n", " [\"XX\", \"E\", \"N\", \"S\"]\n", " )\n", "]\n", "\n", "# Output \n", "df.to_csv(OUT_FILE, index=False)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate features\n", "\n", "We then generate features. The categorical features are encoded using the average value of a few important variables in each category." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "# Input \n", "INP_FILE = \"cleaned_medicare_data.csv\"\n", "OUT_FILE = \"medicare_features.csv\"\n", "\n", "# Set some parameters \n", "CAT_VARS = [\n", " \"nppes_provider_mi\",\n", " #\"nppes_credentials\", # This is rather messy, so ignoring it. \n", " \"nppes_provider_gender\",\n", " \"nppes_entity_code\",\n", " \"nppes_provider_city\",\n", " \"nppes_provider_zip5\",\n", " #\"nppes_provider_country\", # Almost all cases are US \n", " \"specialty_description\",\n", " \"medicare_prvdr_enroll_status\",\n", " \"nppes_provider_state\",\n", "]\n", "\n", "CON_VARS = [\n", " \"total_claim_count\",\n", " \"total_30_day_fill_count\",\n", " \"total_drug_cost\",\n", " \"total_day_supply\",\n", " \"bene_count\",\n", " \"total_claim_count_ge65\",\n", " \"total_30_day_fill_count_ge65\",\n", " \"total_drug_cost_ge65\",\n", " \"total_day_supply_ge65\",\n", " \"bene_count_ge65\",\n", " \"brand_claim_count\",\n", " \"brand_drug_cost\",\n", " \"generic_claim_count\",\n", " \"generic_drug_cost\",\n", " \"other_claim_count\",\n", " \"other_drug_cost\",\n", " \"mapd_claim_count\",\n", " \"mapd_drug_cost\",\n", " \"pdp_claim_count\",\n", " \"pdp_drug_cost\",\n", " \"lis_claim_count\",\n", " \"lis_drug_cost\",\n", " \"nonlis_claim_count\",\n", " \"nonlis_drug_cost\",\n", " \"opioid_claim_count\",\n", " \"opioid_drug_cost\",\n", " \"opioid_day_supply\",\n", " \"opioid_bene_count\",\n", " \"antibiotic_claim_count\",\n", " \"antibiotic_drug_cost\",\n", " \"antibiotic_bene_count\",\n", " \"hrm_claim_count_ge65\",\n", " \"hrm_drug_cost_ge65\",\n", " \"hrm_bene_count_ge65\",\n", " \"antipsych_claim_count_ge65\",\n", " \"antipsych_drug_cost_ge65\",\n", " \"antipsych_bene_count_ge65\",\n", " \"average_age_of_beneficiaries\",\n", " \"beneficiary_age_less_65_count\",\n", " \"beneficiary_age_65_74_count\",\n", " \"beneficiary_age_75_84_count\",\n", " \"beneficiary_age_greater_84_count\",\n", " \"beneficiary_female_count\",\n", " \"beneficiary_male_count\",\n", " \"beneficiary_race_white_count\",\n", " \"beneficiary_race_black_count\",\n", " \"beneficiary_race_asian_pi_count\",\n", " \"beneficiary_race_hispanic_count\",\n", " \"beneficiary_race_nat_ind_count\",\n", " \"beneficiary_race_other_count\",\n", " \"beneficiary_nondual_count\",\n", " \"beneficiary_dual_count\",\n", " \"beneficiary_average_risk_score\",\n", "]\n", "\n", "# Read and clean data \n", "df = pd.read_csv(INP_FILE, low_memory=False)\n", "\n", "# Embed categorical features \n", "embedded_cat_features = []\n", "for item in CAT_VARS:\n", " tmp_df = df.groupby(item, as_index=False).agg(\n", " {\n", " \"opioid_claim_count\": \"mean\",\n", " \"opioid_drug_cost\": \"mean\",\n", " \"opioid_day_supply\": \"mean\",\n", " \"opioid_bene_count\": \"mean\",\n", " \"opioid_prescriber_rate\": \"mean\",\n", " }\n", " ).rename(\n", " columns={\n", " \"opioid_claim_count\": \"%s_opioid_claim_count\" % item,\n", " \"opioid_drug_cost\": \"%s_opioid_drug_cost\" % item,\n", " \"opioid_day_supply\": \"%s_opioid_day_supply\" % item,\n", " \"opioid_bene_count\": \"%s_opioid_bene_count\" % item,\n", " \"opioid_prescriber_rate\": \"%s_opioid_prescriber_rate\" % item,\n", " }\n", " )\n", "\n", " df = df.merge(tmp_df, how=\"left\", on=item)\n", "\n", " embedded_cat_features += [\n", " \"%s_opioid_claim_count\" % item,\n", " \"%s_opioid_drug_cost\" % item,\n", " \"%s_opioid_day_supply\" % item,\n", " \"%s_opioid_bene_count\" % item,\n", " \"%s_opioid_prescriber_rate\" % item,\n", " ]\n", "\n", "# Drop unembedded categorical variables and some others \n", "df = df[[\"npi\"] + CON_VARS + embedded_cat_features]\n", "\n", "# Write features file \n", "df.to_csv(OUT_FILE, index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature Selection\n", "\n", "Once the features are generated, we can implement the above-mentioned feature selection algorithm. We start by importing some libraries, setting some parameters, and loading the features into a Pandas dataframe." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Import libs\n", "import sys\n", "import os\n", "import time\n", "import numpy as np\n", "import pandas as pd\n", "\n", "from qci_client import QciClient\n", "\n", "# Define some parameters\n", "FEATURES_FILE = \"medicare_features.csv\"\n", "REDUCED_DIM = 10\n", "\n", "# Read features\n", "df = pd.read_csv(FEATURES_FILE, low_memory=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now print the feature names and get the total count of features in the dataset," ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original dimension is 93; reduced dimension will be 10\n" ] } ], "source": [ "feature_names = list(set(df.columns) - {\"npi\"})\n", "\n", "orig_dim = len(feature_names)\n", "\n", "print(\n", " \"Original dimension is %d; reduced dimension will be %d\" % (\n", " orig_dim,\n", " REDUCED_DIM,\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We should now create the objective matrix $C$," ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Generate the objective matrix\n", "X = np.array(df[feature_names])\n", "\n", "C = abs(np.corrcoef(X, rowvar=False))\n", "\n", "# Make correlation symmetric to machine precision\n", "C = 0.5 * (C + C.transpose())\n", "\n", "objective = C - np.eye(orig_dim)\n", "objective = np.array(objective, dtype=np.float32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And create the constraint matrix," ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(94,)\n" ] } ], "source": [ "# Generate the constraint\n", "cons_lhs = np.ones(shape=(orig_dim), dtype=np.float32)\n", "cons_lhs = cons_lhs\n", "cons_rhs = np.array([-REDUCED_DIM])\n", "\n", "constraints = np.hstack([cons_lhs, cons_rhs])\n", "\n", "print(constraints.shape)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now solve the above quadratic binary problem using QCi's Dirac-1," ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "token = \"your_token\"\n", "api_url = \"https://api.qci-prod.com\"\n", "qci = QciClient(api_token=token, url=api_url)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2024-05-08 10:28:12 - Dirac allocation balance = 0 s (unmetered)\n", "2024-05-08 10:28:12 - Job submitted: job_id='663bb62cd448b017e54f94bd'\n", "2024-05-08 10:28:12 - QUEUED\n", "2024-05-08 10:28:15 - RUNNING\n", "2024-05-08 10:48:23 - COMPLETED\n", "2024-05-08 10:48:26 - Dirac allocation balance = 0 s (unmetered)\n", "{'job_info': {'job_id': '663bb62cd448b017e54f94bd', 'job_submission': {'job_name': 'tutorial_eqc1', 'job_tags': ['tutorial_eqc1'], 'problem_config': {'quadratic_linearly_constrained_binary_optimization': {'constraints_file_id': '663bb62c98263204a3657526', 'objective_file_id': '663bb62b98263204a3657524', 'alpha': 5, 'atol': 1e-10}}, 'device_config': {'dirac-1': {'num_samples': 20}}}, 'job_status': {'submitted_at_rfc3339nano': '2024-05-08T17:28:12.687Z', 'queued_at_rfc3339nano': '2024-05-08T17:28:12.689Z', 'running_at_rfc3339nano': '2024-05-08T17:28:13.046Z', 'completed_at_rfc3339nano': '2024-05-08T17:48:21.532Z'}, 'job_result': {'file_id': '663bbae598263204a3657528', 'device_usage_s': 1128}}, 'status': 'COMPLETED', 'results': {'counts': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'energies': [-497.8916331699111, -497.8896800449111, -497.8896800449111, -497.8887034824111, -497.8847972324111, -497.8847972324111, -497.8847972324111, -497.8838206699111, -497.8789378574111, -497.8623362949111, -497.8584300449111, -497.8515941074111, -497.8291331699111, -497.8125316074111, -497.7988597324111, -497.7949534824111, -497.7773753574111, -497.7773753574111, -497.7724925449111, -497.7724925449111], 'feasibilities': [True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True], 'objective_values': [2.108758625341579, 2.109875851077959, 2.109875851077959, 2.1111518519464876, 2.1151239282917222, 2.1151239282917227, 2.115212611388415, 2.1162998459767546, 2.121476054191589, 2.13726658251835, 2.1418038606643672, 2.148616509046405, 2.1707787389168516, 2.1876401392510156, 2.2009708418045193, 2.2050841469317675, 2.2229102615965526, 2.2229102615965526, 2.227802827401319, 2.227861546212807], 'solutions': [[0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]]}}\n", "Energies: [-497.8916331699111, -497.8896800449111, -497.8896800449111, -497.8887034824111, -497.8847972324111, -497.8847972324111, -497.8847972324111, -497.8838206699111, -497.8789378574111, -497.8623362949111, -497.8584300449111, -497.8515941074111, -497.8291331699111, -497.8125316074111, -497.7988597324111, -497.7949534824111, -497.7773753574111, -497.7773753574111, -497.7724925449111, -497.7724925449111]\n" ] } ], "source": [ "# Create json objects\n", "objective_json = {\n", " \"file_name\": \"objective_tutorial.json\",\n", " \"file_config\": {\n", " \"objective\": {\"data\": objective, \"num_variables\": orig_dim},\n", " } \n", "}\n", " \n", "constraint_json = {\n", " \"file_name\": \"constraints_tutorial.json\",\n", " \"file_config\": {\n", " \"constraints\": {\n", " \"data\": constraints, \n", " #\"num_variables\": orig_dim,\n", " #\"num_constraints\": 1,\n", " }\n", " }\n", "}\n", "\n", "# Solve the optimizzation problem\n", "#qci = QciClient()\n", "\n", "objective_file_id = qci.upload_file(file=objective_json)[\"file_id\"]\n", "constraint_file_id = qci.upload_file(file=constraint_json)[\"file_id\"]\n", "\n", "# Setup job json\n", "job_params = {\n", " \"device_type\": \"dirac-1\", \n", " \"alpha\": 5.0, \n", " \"num_samples\": 20,\n", "}\n", "body = qci.build_job_body(\n", " job_type=\"sample-constraint\", \n", " job_params=job_params,\n", " constraints_file_id=constraint_file_id, \n", " objective_file_id=objective_file_id,\n", " job_name=f\"tutorial_eqc1\",\n", " job_tags=[\"tutorial_eqc1\"],\n", ")\n", "\n", "# Run the job\n", "job_response_json = qci.process_job(job_body=body)\n", "\n", "print(job_response_json)\n", "\n", "results = job_response_json[\"results\"]\n", "energies = results[\"energies\"]\n", "samples = results[\"solutions\"]\n", "is_feasibles = results[\"feasibilities\"]\n", "\n", "if True:\n", " print(\"Energies:\", energies) \n", "\n", "# Pick a feasible solution with lowest energy \n", "# The sample solutions are sorted by energy \n", "sol = None\n", "for i, item in enumerate(samples):\n", " sol = item\n", " is_feasible = is_feasibles[i]\n", "\n", " if is_feasible:\n", " break\n", "\n", "if not is_feasible:\n", " print(\"Solution is not feasible!\")\n", "\n", "assert sol is not None, \"No feasible solution found!\"\n", "\n", "assert len(sol) == orig_dim, \"Inconsistent solution size!\"\n", "\n", "assert sum(sol) == REDUCED_DIM, \"Solution is not feasible!\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can print the list of selected variables," ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['nppes_provider_city_opioid_drug_cost', 'beneficiary_race_black_count', 'nppes_provider_gender_opioid_drug_cost', 'specialty_description_opioid_drug_cost', 'nppes_provider_mi_opioid_drug_cost', 'nppes_entity_code_opioid_prescriber_rate', 'beneficiary_race_nat_ind_count', 'antipsych_drug_cost_ge65', 'beneficiary_average_risk_score', 'beneficiary_race_asian_pi_count']\n" ] } ], "source": [ "selected_vars = []\n", "for i in range(orig_dim):\n", " if sol[i] > 0:\n", " selected_vars.append(feature_names[i])\n", "\n", "print(selected_vars) " ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.7" } }, "nbformat": 4, "nbformat_minor": 4 }