{ "cells": [ { "cell_type": "markdown", "id": "da5ac69f-5063-40aa-805d-09c65c88d9b5", "metadata": {}, "source": [ "# Geospatial analysis of clinical trial sites in the USA: \n", "## Exploring Demographics and Vulnerabilities\n", "\n", "#### Author - Priyadarshini Satish\n", "#### Date – 03/10/2023\n", "#### Course - GGIS 407 - Cyber GIS and Geospatial Data Science\n", "#### Professor - Dr. Anand Padmanabhan\n" ] }, { "cell_type": "markdown", "id": "dc1f9522-bac4-4706-b655-9ab1118f0bc9", "metadata": {}, "source": [ "### Introduction\n", "In the medical and pharmaceutical industry, selection of sites for clinical trials is a critical task that involves analyzing proprietary information about demographics, geography, economy, and market dynamics. A comprehensive list of clinical trial sites from published by the US National Library of Medicine is available online. This data set includes information about clinical trials registered on the website since 2000 across the world. While it is not exhaustive, it contains a list of all studies that require registration with the FDAA that involve studies on behaviors, effect of a drug, or procedure on human volunteers. \n", "\n", "In the present study, this data set is filtered for sites in the United States. The city each site is in is plotted on the map. This geospatial data is visualized as a heat map and compared with county level socio-economic indicators from the US census bureau. The data retrieved from the census includes, median age, health insurance coverage, median household income, poverty rate. The expected result is insights on where pharmaceutical companies tend to locate sites for such studies. Further, we can explore if certain vulnerabilities such as lower income are exploited when locating such sites to attract more participants.\n" ] }, { "cell_type": "markdown", "id": "70e02c6a-d29d-43eb-bf70-b3532c713933", "metadata": {}, "source": [ "## Data Visualization" ] }, { "cell_type": "code", "execution_count": null, "id": "b1d2bf46-1312-4945-81ba-cb66a8573dcd", "metadata": {}, "outputs": [], "source": [ "#import necessary libraries\n", "import pandas as pd\n", "import geopandas as gpd\n", "import math\n", "import folium\n", "import zipfile\n", "from folium import Choropleth, Circle, Marker\n", "from folium.plugins import HeatMap, MarkerCluster" ] }, { "cell_type": "code", "execution_count": null, "id": "8a9e003c-5061-43aa-9416-920ff154c907", "metadata": {}, "outputs": [], "source": [ "with zipfile.ZipFile('Geodata.zip','r') as zip_ref:\n", " zip_ref.extractall()" ] }, { "cell_type": "code", "execution_count": null, "id": "d453a781-2843-4303-b047-53f7cf881f81", "metadata": {}, "outputs": [], "source": [ "censusdata = pd.read_excel(\"censusdata.xlsx\")\n", "#excel data from previous section was converted to a geojson using arcgis Pro. That file is loaded into the notebook here. \n", "geodata = gpd.read_file('CT_geo.geojson')" ] }, { "cell_type": "code", "execution_count": null, "id": "4e998042-df0b-4402-86e2-3be85ba1d304", "metadata": {}, "outputs": [], "source": [ "#Creating groups of unique cities to get a count of facilities by city \n", "bubbledata = geodata.groupby(['city','state','country']).agg(\n", " count_of_facilities = ('facility_name','count'),\n", " lat = ('lat','first'),\n", " lon = ('lng','first'),\n", " geometry = ('geometry','first')).reset_index()" ] }, { "cell_type": "markdown", "id": "f53ba7e3-df8f-4452-89a8-b287c405e30b", "metadata": {}, "source": [ "Creating a table grouping by cities to get a count of how many facilities are in each city " ] }, { "cell_type": "code", "execution_count": null, "id": "eddff921-07ac-404d-9016-097566356817", "metadata": {}, "outputs": [], "source": [ "# Create a map indicating the location of all the cities and the count of facilities in each. \n", "m_4 = folium.Map(location=[48,-102], tiles='cartodbpositron', zoom_start=3)\n", "\n", "# Add points to the map\n", "mc = MarkerCluster()\n", "for idx, row in bubbledata.iterrows():\n", " if not math.isnan(row['lon']) and not math.isnan(row['lat']):\n", " mc.add_child(Marker([row['lat'], row['lon']]))\n", "m_4.add_child(mc)\n", "# add marker one by one on the map\n", "for i in range(0,len(bubbledata)):\n", " folium.Marker(\n", " location=[bubbledata.iloc[i]['lat'], bubbledata.iloc[i]['lon']],\n", " popup=bubbledata.iloc[i][['city','count_of_facilities']],\n", " # icon=folium.DivIcon(html=f\"\"\"