{ "cells": [ { "cell_type": "markdown", "id": "4575537f", "metadata": {}, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ThomasAlbin/Astroniz-YT-Tutorials/blob/main/[ML1]-Asteroid-Spectra/3_data_enrichment.ipynb)" ] }, { "cell_type": "markdown", "id": "72b515e8", "metadata": {}, "source": [ "# Step 3: Data Enrichment\n", "\n", "This section is not about feature creation (for an ML algorithm), but to enrich the asteroid dataframe with more, additional information." ] }, { "cell_type": "code", "execution_count": 1, "id": "d4987fa4", "metadata": {}, "outputs": [], "source": [ "# Import standard libraries\n", "import os\n", "import pathlib\n", "\n", "# Import installed libraries\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "8751fcfc", "metadata": {}, "outputs": [], "source": [ "# Let's mount the Google Drive, where we store files and models (if applicable, otherwise work\n", "# locally)\n", "try:\n", " from google.colab import drive\n", " drive.mount('/gdrive')\n", " core_path = \"/gdrive/MyDrive/Colab/asteroid_taxonomy/\"\n", "except ModuleNotFoundError:\n", " core_path = \"\"" ] }, { "cell_type": "code", "execution_count": 3, "id": "2b6e61df", "metadata": {}, "outputs": [], "source": [ "# Read the level 1 dataframe\n", "asteroids_df = pd.read_pickle(os.path.join(core_path, \"data/lvl1/\", \"asteroids_merged.pkl\"))" ] }, { "cell_type": "markdown", "id": "3d7eee95", "metadata": {}, "source": [ "## Bus classification to Main group\n", "\n", "A great summary of asteroid classification schemas, the science behind it and some historical context can be found [here](https://vissiniti.com/asteroid-classification/). One flow chart shows the link between miscellaneous classification schemas. On the right side the flow chart merges into a general \"main group\". These groups are:\n", "\n", "- C: Carbonaceous asteroids\n", "- S: Silicaceous (stony) asteroids\n", "- X: Metallic asteroids\n", "- Other: Miscellaneous types of rare origin / composition; or even unknown composition like T-Asteroids\n", "\n", "[](https://vissiniti.com/asteroid-classification/)\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "278bfa01", "metadata": {}, "outputs": [], "source": [ "# Create a dictionary that maps the Bus Classification with the main group\n", "bus_to_main_dict = {\n", " 'A': 'Other',\n", " 'B': 'C',\n", " 'C': 'C',\n", " 'Cb': 'C',\n", " 'Cg': 'C',\n", " 'Cgh': 'C',\n", " 'Ch': 'C',\n", " 'D': 'Other',\n", " 'K': 'Other',\n", " 'L': 'Other',\n", " 'Ld': 'Other',\n", " 'O': 'Other',\n", " 'R': 'Other',\n", " 'S': 'S',\n", " 'Sa': 'S',\n", " 'Sk': 'S',\n", " 'Sl': 'S',\n", " 'Sq': 'S',\n", " 'Sr': 'S',\n", " 'T': 'Other',\n", " 'V': 'Other',\n", " 'X': 'X',\n", " 'Xc': 'X',\n", " 'Xe': 'X',\n", " 'Xk': 'X'\n", " }" ] }, { "cell_type": "code", "execution_count": 5, "id": "92d373b3", "metadata": {}, "outputs": [], "source": [ "# Create a new \"main group class\"\n", "asteroids_df.loc[:, \"Main_Group\"] = asteroids_df[\"Bus_Class\"].apply(lambda x:\n", " bus_to_main_dict.get(x, \"None\"))" ] }, { "cell_type": "code", "execution_count": 6, "id": "805c350f", "metadata": {}, "outputs": [], "source": [ "# Remove the file path and Designation Number\n", "asteroids_df.drop(columns=[\"DesNr\", \"FilePath\"], inplace=True)" ] }, { "cell_type": "code", "execution_count": 7, "id": "effe38d4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameBus_ClassSpectrumDFMain_Group
01 CeresCWavelength_in_microm Reflectance_norm550n...C
12 PallasBWavelength_in_microm Reflectance_norm550n...C
23 JunoSkWavelength_in_microm Reflectance_norm550n...S
34 VestaVWavelength_in_microm Reflectance_norm550n...Other
45 AstraeaSWavelength_in_microm Reflectance_norm550n...S
...............
13341996 UKSqWavelength_in_microm Reflectance_norm550n...S
13351996 VCSWavelength_in_microm Reflectance_norm550n...S
13361997 CZ5SWavelength_in_microm Reflectance_norm550n...S
13371997 RD1SqWavelength_in_microm Reflectance_norm550n...S
13381998 WSSrWavelength_in_microm Reflectance_norm550n...S
\n", "

1339 rows × 4 columns

\n", "
" ], "text/plain": [ " Name Bus_Class SpectrumDF \\\n", "0 1 Ceres C Wavelength_in_microm Reflectance_norm550n... \n", "1 2 Pallas B Wavelength_in_microm Reflectance_norm550n... \n", "2 3 Juno Sk Wavelength_in_microm Reflectance_norm550n... \n", "3 4 Vesta V Wavelength_in_microm Reflectance_norm550n... \n", "4 5 Astraea S Wavelength_in_microm Reflectance_norm550n... \n", "... ... ... ... \n", "1334 1996 UK Sq Wavelength_in_microm Reflectance_norm550n... \n", "1335 1996 VC S Wavelength_in_microm Reflectance_norm550n... \n", "1336 1997 CZ5 S Wavelength_in_microm Reflectance_norm550n... \n", "1337 1997 RD1 Sq Wavelength_in_microm Reflectance_norm550n... \n", "1338 1998 WS Sr Wavelength_in_microm Reflectance_norm550n... \n", "\n", " Main_Group \n", "0 C \n", "1 C \n", "2 S \n", "3 Other \n", "4 S \n", "... ... \n", "1334 S \n", "1335 S \n", "1336 S \n", "1337 S \n", "1338 S \n", "\n", "[1339 rows x 4 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Show the final data set for anyone who is interested ...\n", "asteroids_df" ] }, { "cell_type": "code", "execution_count": 8, "id": "ad7ba1dc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Wavelength_in_micromReflectance_norm550nm
00.440.9281
10.450.9388
20.460.9488
30.470.9572
40.480.9643
50.490.9716
60.500.9788
70.510.9859
80.520.9923
90.530.9955
100.540.9969
110.551.0000
120.561.0040
130.571.0056
140.581.0037
150.591.0036
160.601.0044
170.611.0071
180.621.0107
190.631.0113
200.641.0117
210.651.0127
220.661.0128
230.671.0124
240.681.0151
250.691.0160
260.701.0146
270.711.0178
280.721.0222
290.731.0216
300.741.0191
310.751.0179
320.761.0167
330.771.0149
340.781.0161
350.791.0176
360.801.0178
370.811.0196
380.821.0200
390.831.0164
400.841.0135
410.851.0140
420.861.0147
430.871.0151
440.881.0142
450.891.0146
460.901.0165
470.911.0181
480.921.0200
\n", "
" ], "text/plain": [ " Wavelength_in_microm Reflectance_norm550nm\n", "0 0.44 0.9281\n", "1 0.45 0.9388\n", "2 0.46 0.9488\n", "3 0.47 0.9572\n", "4 0.48 0.9643\n", "5 0.49 0.9716\n", "6 0.50 0.9788\n", "7 0.51 0.9859\n", "8 0.52 0.9923\n", "9 0.53 0.9955\n", "10 0.54 0.9969\n", "11 0.55 1.0000\n", "12 0.56 1.0040\n", "13 0.57 1.0056\n", "14 0.58 1.0037\n", "15 0.59 1.0036\n", "16 0.60 1.0044\n", "17 0.61 1.0071\n", "18 0.62 1.0107\n", "19 0.63 1.0113\n", "20 0.64 1.0117\n", "21 0.65 1.0127\n", "22 0.66 1.0128\n", "23 0.67 1.0124\n", "24 0.68 1.0151\n", "25 0.69 1.0160\n", "26 0.70 1.0146\n", "27 0.71 1.0178\n", "28 0.72 1.0222\n", "29 0.73 1.0216\n", "30 0.74 1.0191\n", "31 0.75 1.0179\n", "32 0.76 1.0167\n", "33 0.77 1.0149\n", "34 0.78 1.0161\n", "35 0.79 1.0176\n", "36 0.80 1.0178\n", "37 0.81 1.0196\n", "38 0.82 1.0200\n", "39 0.83 1.0164\n", "40 0.84 1.0135\n", "41 0.85 1.0140\n", "42 0.86 1.0147\n", "43 0.87 1.0151\n", "44 0.88 1.0142\n", "45 0.89 1.0146\n", "46 0.90 1.0165\n", "47 0.91 1.0181\n", "48 0.92 1.0200" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ... and also the spectrum of Ceres\n", "asteroids_df.loc[asteroids_df[\"Name\"] == \"1 Ceres\"][\"SpectrumDF\"][0]" ] }, { "cell_type": "code", "execution_count": 9, "id": "e181ee97", "metadata": {}, "outputs": [], "source": [ "# Create Level 2 directory and save the dataframe\n", "pathlib.Path(os.path.join(core_path, \"data/lvl2\")).mkdir(parents=True, exist_ok=True)\n", "\n", "# Save the dataframe as a pickle file\n", "asteroids_df.to_pickle(os.path.join(core_path, \"data/lvl2/\", \"asteroids.pkl\"), protocol=4)" ] } ], "metadata": { "interpreter": { "hash": "4cd7ab41f5fca4b9b44701077e38c5ffd31fe66a6cab21e0214b68d958d0e462" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }