{ "cells": [ { "cell_type": "markdown", "id": "58d224c1-bb01-46e9-b8ff-418847b145b6", "metadata": {}, "source": [ "# Creating test spreadsheet\n", "## Import pandas and load files" ] }, { "cell_type": "code", "execution_count": 1, "id": "bf77f51c-2b0f-41eb-b694-4feb03bc149e", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import os" ] }, { "cell_type": "code", "execution_count": 2, "id": "6d10441b-6ebf-4790-8af4-efdcc99ecb0f", "metadata": {}, "outputs": [], "source": [ "data_dir='/Users/davecash/Data/IDEAS/sample'\n", "xls_demo=os.path.join(data_dir,'GENFI_DEMOGRAPHICS_DF3_FINAL_BLINDED.xlsx')\n", "df_demo=pd.read_excel(xls_demo)\n", "xls_img=os.path.join(data_dir,'GENFI_IMAGING_DF3_FINAL_BLINDED.xlsx')\n", "df_img=pd.read_excel(xls_img)" ] }, { "cell_type": "markdown", "id": "51b295e3-eb1a-49ae-9736-96e031047ec8", "metadata": {}, "source": [ "## Combining data\n", "Join the data, get rid of unneeded variables and keep first visit that has both demographics and imaging" ] }, { "cell_type": "code", "execution_count": 3, "id": "a49538cc-a34f-434a-b8e0-d7e437910182", "metadata": {}, "outputs": [], "source": [ "df_combined=pd.merge(df_img,df_demo,on=['Blinded Code','Visit'])" ] }, { "cell_type": "markdown", "id": "105dd655-0b97-4728-a77b-29f0f414dfea", "metadata": {}, "source": [ "Categorize key variables and get rid of any rare mutations" ] }, { "cell_type": "code", "execution_count": 4, "id": "83bada98-bc9a-4af2-a427-ecd7c853623c", "metadata": {}, "outputs": [], "source": [ "df_combined['MutationType']=pd.Categorical(df_combined['Genetic Group'],categories=['C9orf72','GRN','MAPT'])\n", "df_combined = df_combined.dropna(subset=['MutationType','Age at visit','DRC_QC','Scanner'])" ] }, { "cell_type": "markdown", "id": "4b538a3c-0fcd-4922-8a2c-8879866da49e", "metadata": {}, "source": [ "Now it's time to get rid of some missing values. Start by assuming TIV constant over time and just keeping first value" ] }, { "cell_type": "code", "execution_count": 5, "id": "afbc4b4e-7b2a-40a0-a4b0-20dd99c5c9e2", "metadata": {}, "outputs": [], "source": [ "df_combined['TIV'] = df_combined.groupby(['Blinded Code'])[\"TIV mm3\"].fillna(method=\"ffill\")" ] }, { "cell_type": "markdown", "id": "70868633-538d-4d7a-8b82-e94fa74efd47", "metadata": {}, "source": [ "Remove a bunch of columns that we don't need." ] }, { "cell_type": "code", "execution_count": 6, "id": "2a10e9b2-5f51-4dde-8351-315680df3bee", "metadata": {}, "outputs": [], "source": [ "df_combined = df_combined.drop(columns=['Date of scan','Blinded Site_x','TIV mm3','1 Orbitofrontal LEFT', \n", " '2. DLPFC LEFT', '3. VMPFC LEFT', '4. Motor LEFT', '5. Opercular LEFT', \n", " '6. FRP LEFT', '7. Medial Temp LEFT', '8. Lateral Temp LEFT', \n", " '9. Temporal Pole LEFT', '10. Supra Temp LEFT', '11. Medial Parietal LEFT', \n", " '12. Lateral Parietal LEFT', '13. Sensory LEFT', '14. Medial occ LEFT', \n", " '15. Lateral Occ LEFT', '16. Anterior Cing LEFT', '17. Middle Cing LEFT', \n", " '18. Posterior Cing LEFT', '19. Ant Insula LEFT', '20. Post Insula LEFT', \n", " '1 Orbitofrontal RIGHT', '2. DLPFC RIGHT', '3. VMPFC RIGHT', '4. Motor RIGHT',\n", " '5. Opercular RIGHT', '6. FRP RIGHT', '7. Medial Temp RIGHT', \n", " '8. Lateral Temp RIGHT', '9. Temporal Pole RIGHT', '10. Supra Temp RIGHT', \n", " '11. Medial Parietal RIGHT', '12. Lateral Parietal RIGHT', '13. Sensory RIGHT', \n", " '14. Medial occ RIGHT', '15. Lateral Occ RIGHT', '16. Anterior Cing RIGHT', \n", " '17. Middle Cing RIGHT', '18. Posterior Cing RIGHT', '19. Ant Insula RIGHT', \n", " '20. Post Insula RIGHT','Genetic Group', 'Blinded Family', 'Date of assessment','Handedness', 'Employment', 'Ethnicity'])" ] }, { "cell_type": "markdown", "id": "06059f91-86f7-4292-96c7-5ebed626e3da", "metadata": {}, "source": [ "Keep only images that passed QC for GIF" ] }, { "cell_type": "code", "execution_count": 7, "id": "2d3bfd09-8c5f-456f-9bf9-d19f7ede59cf", "metadata": {}, "outputs": [], "source": [ "df_combined=df_combined[df_combined['QC_include in GIF']==1]" ] }, { "cell_type": "markdown", "id": "7bf3a991-7ad9-4736-b10a-88d68926c965", "metadata": {}, "source": [ "Keep only one timepoint" ] }, { "cell_type": "code", "execution_count": 8, "id": "695e7b92-6472-403f-9c18-51b2b97dcee1", "metadata": {}, "outputs": [], "source": [ "df_xsec=df_combined.drop_duplicates(subset='Blinded Code')" ] }, { "cell_type": "markdown", "id": "bcbff51f-e175-4c82-8fad-5666e4fbb54e", "metadata": {}, "source": [ "Find all of the scanners, sort and assign a new identifying number to use instead of the official GENFI one." ] }, { "cell_type": "code", "execution_count": 9, "id": "d41bbb4c-3a4d-4c66-ab82-07caff317fc8", "metadata": {}, "outputs": [], "source": [ "site_list=df_xsec['Blinded Site_y'].drop_duplicates()\n", "site_list=site_list.sort_values(ignore_index=True)\n", "site_list = site_list.reset_index(level=0)\n", "site_list.set_index('Blinded Site_y',inplace=True)" ] }, { "cell_type": "code", "execution_count": 10, "id": "891a28e5-9bfe-4206-9fd3-b09e3b55256d", "metadata": {}, "outputs": [], "source": [ "df_xsec = pd.merge(df_xsec,site_list,how='left',left_on='Blinded Site_y',right_index=True)" ] }, { "cell_type": "markdown", "id": "01decab5-6c04-4a0c-affc-01231560ba35", "metadata": {}, "source": [ "Rename some markdown columns for easier use later." ] }, { "cell_type": "code", "execution_count": 11, "id": "49b1aaee-89d4-4e5b-acbc-2d1805442ac8", "metadata": {}, "outputs": [], "source": [ "df_xsec.rename(columns={'index':'Site',\n", " 'Gender':'Sex',\n", " 'Genetic status 2': 'Group',\n", " 'Age at visit': 'Age'},inplace=True)\n", "df_xsec.columns = df_xsec.columns.str.replace(' ', '_')" ] }, { "cell_type": "markdown", "id": "8baffd36-cc0c-48e7-bcec-d0b813ae6e5c", "metadata": {}, "source": [ "Finally blind the blinded code and re-sort on the randomcode" ] }, { "cell_type": "code", "execution_count": 12, "id": "2f2dc411-0260-485c-b92a-1a4326f0d950", "metadata": {}, "outputs": [], "source": [ "new_id=1+np.arange(len(df_xsec.index))\n", "np.random.shuffle(new_id)" ] }, { "cell_type": "code", "execution_count": 13, "id": "05a87893-a139-4601-88aa-a812b79cea35", "metadata": {}, "outputs": [], "source": [ "df_xsec['New Code']=new_id" ] }, { "cell_type": "code", "execution_count": 14, "id": "60ae1f7e-33ba-45c5-b9fd-803fb2a1e907", "metadata": {}, "outputs": [], "source": [ "df_xsec['ID']=df_xsec['New Code'].map('GENFI{:03d}'.format)" ] }, { "cell_type": "code", "execution_count": 16, "id": "8835c02e-a712-4f1f-8b76-dd1546a59ef3", "metadata": {}, "outputs": [], "source": [ "df_xsec=df_xsec[['ID','Site','Scanner', 'Age', 'Sex',\n", " 'Education', 'EYO', 'Group', 'MutationType', 'TIV',\n", " 'Right_Accumbens_Area','Left_Accumbens_Area', 'Right_Amygdala', \n", " 'Left_Amygdala', 'Pons','Brain_Stem', 'Right_Caudate', 'Left_Caudate',\n", " 'Right_Hippocampus','Left_Hippocampus', 'Right_Pallidum', 'Left_Pallidum',\n", " 'Right_Putamen','Left_Putamen', 'Right_Thalamus_Proper', 'Left_Thalamus_Proper',\n", " 'Total_Brain', 'Frontal_lobe_volume', 'Temporal_lobe_volume',\n", " 'Parietal_lobe_volume', 'Occipital_lobe_volume', 'Cingulate_volume',\n", " 'Insula_volume', 'Left_Frontal_lobe_volume', 'Right_Frontal_lobe_volume', \n", " 'Left_Temporal_lobe_volume','Right_Temporal_lobe_volume', 'Left_Parietal_lobe_volume',\n", " 'Right_Parietal_lobe_volume', 'Left_Occipital_lobe_volume',\n", " 'Right_Occipital_lobe_volume', 'Left_Cingulate_volume', 'Right_Cingulate_volume', \n", " 'Left_Insula_volume', 'Right_Insula_volume','Total_Cerebellum']]\n", "df_xsec=df_xsec.set_index('ID',drop=True)" ] }, { "cell_type": "code", "execution_count": 17, "id": "a0ff6165-5f70-49cd-9e0d-6020a6b8a865", "metadata": {}, "outputs": [], "source": [ "df_xsec=df_xsec.sort_index()" ] }, { "cell_type": "code", "execution_count": 18, "id": "82c7e6f0-328a-4b7f-942d-feea594445d2", "metadata": {}, "outputs": [], "source": [ "df_xsec['TIV']=df_xsec['TIV']/1000\n", "df_xsec=df_xsec.round(1)\n" ] }, { "cell_type": "code", "execution_count": 20, "id": "ce4ba824-9276-4ec1-844d-eebd4818c1a5", "metadata": {}, "outputs": [], "source": [ "df_xsec.to_excel(os.path.join(data_dir,'GENFI_DEMON_SPREADSHEET.xlsx'))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 5 }