[](https://colab.research.google.com/github/ThomasAlbin/Astroniz-YT-Tutorials/blob/main/[ML1]-Asteroid-Spectra/3_data_enrichment.ipynb)

# Step 3: Data Enrichment

This section is not about feature creation (for an ML algorithm), but to enrich the asteroid dataframe with more, additional information.

In [1]:
# Import standard libraries
import os
import pathlib

# Import installed libraries
import pandas as pd

In [2]:
# Let's mount the Google Drive, where we store files and models (if applicable, otherwise work
# locally)
try:
 from google.colab import drive
 drive.mount('/gdrive')
 core_path = "/gdrive/MyDrive/Colab/asteroid_taxonomy/"
except ModuleNotFoundError:
 core_path = ""

In [3]:
# Read the level 1 dataframe
asteroids_df = pd.read_pickle(os.path.join(core_path, "data/lvl1/", "asteroids_merged.pkl"))

## Bus classification to Main group

A great summary of asteroid classification schemas, the science behind it and some historical context can be found [here](https://vissiniti.com/asteroid-classification/). One flow chart shows the link between miscellaneous classification schemas. On the right side the flow chart merges into a general "main group". These groups are:

- C: Carbonaceous asteroids
- S: Silicaceous (stony) asteroids
- X: Metallic asteroids
- Other: Miscellaneous types of rare origin / composition; or even unknown composition like T-Asteroids

[<img src="https://i2.wp.com/vissiniti.com/wp-content/uploads/2019/07/Asteroid-Classification-Chapman-Tholen-to-Bus-to-BusDeMeo-v4-1.jpg?ssl=1">](https://vissiniti.com/asteroid-classification/)


In [4]:
# Create a dictionary that maps the Bus Classification with the main group
bus_to_main_dict = {
 'A': 'Other',
 'B': 'C',
 'C': 'C',
 'Cb': 'C',
 'Cg': 'C',
 'Cgh': 'C',
 'Ch': 'C',
 'D': 'Other',
 'K': 'Other',
 'L': 'Other',
 'Ld': 'Other',
 'O': 'Other',
 'R': 'Other',
 'S': 'S',
 'Sa': 'S',
 'Sk': 'S',
 'Sl': 'S',
 'Sq': 'S',
 'Sr': 'S',
 'T': 'Other',
 'V': 'Other',
 'X': 'X',
 'Xc': 'X',
 'Xe': 'X',
 'Xk': 'X'
 }

In [5]:
# Create a new "main group class"
asteroids_df.loc[:, "Main_Group"] = asteroids_df["Bus_Class"].apply(lambda x:
 bus_to_main_dict.get(x, "None"))

In [6]:
# Remove the file path and Designation Number
asteroids_df.drop(columns=["DesNr", "FilePath"], inplace=True)

In [7]:
# Show the final data set for anyone who is interested ...
asteroids_df

Unnamed: 0,Name,Bus_Class,SpectrumDF,Main_Group
0,1 Ceres,C,Wavelength_in_microm Reflectance_norm550n...,C
1,2 Pallas,B,Wavelength_in_microm Reflectance_norm550n...,C
2,3 Juno,Sk,Wavelength_in_microm Reflectance_norm550n...,S
3,4 Vesta,V,Wavelength_in_microm Reflectance_norm550n...,Other
4,5 Astraea,S,Wavelength_in_microm Reflectance_norm550n...,S
...,...,...,...,...
1334,1996 UK,Sq,Wavelength_in_microm Reflectance_norm550n...,S
1335,1996 VC,S,Wavelength_in_microm Reflectance_norm550n...,S
1336,1997 CZ5,S,Wavelength_in_microm Reflectance_norm550n...,S
1337,1997 RD1,Sq,Wavelength_in_microm Reflectance_norm550n...,S


In [8]:
# ... and also the spectrum of Ceres
asteroids_df.loc[asteroids_df["Name"] == "1 Ceres"]["SpectrumDF"][0]

Unnamed: 0,Wavelength_in_microm,Reflectance_norm550nm
0,0.44,0.9281
1,0.45,0.9388
2,0.46,0.9488
3,0.47,0.9572
4,0.48,0.9643
5,0.49,0.9716
6,0.5,0.9788
7,0.51,0.9859
8,0.52,0.9923
9,0.53,0.9955


In [9]:
# Create Level 2 directory and save the dataframe
pathlib.Path(os.path.join(core_path, "data/lvl2")).mkdir(parents=True, exist_ok=True)

# Save the dataframe as a pickle file
asteroids_df.to_pickle(os.path.join(core_path, "data/lvl2/", "asteroids.pkl"), protocol=4)