What i wanna do is gather as much info as possible to get a better view of the olympics' data through the years, i have 6 datasets that i'll import and clean then try to pick the unique values and combine some data from each to create a dataset that'll be useful, which i can work on comfortably

-t suffix refers to the total games
-s suffix refers to the summer games
-w suffix refers to the winter games

First Dataset: athlete_events.csv


Second Dataset: olympics.csv


Third Dataset: Summer-Olympic-medals-1976-to-2008.csv

I will not be using this dataset as it seems to cover less data than the athletes_events one and over a less period


Overview of the main datasets


Fourth Dataset Tokyo Datasets


There are three two main types of datasets, either a Medals one that contains the count of medals won by each country and the second is an Athletes one that has the different data on athletes that competed in these events

Fifth Dataset: Nation and their Codes


I need to update the olympics_s DF's medals with medals form the 'tokyo_medals' DF and to do that i have to first add a column with NOCs to the 'tokyo_medals' DF by using the noc DF then i can use some updating function that matches the NOCs from both DFs

1st part is done as i added a column with the codes of the countries to the 'tokyo_medals'

Not only do i have to add each type of medal and the total with its counterpart from the other DF but also update the number of apps and the average medals won, so i'll just add 1 to every country's "Apps" supposing that almost every country in the dataset has participated this year but hasn't won if the Total of medals is NaN or do a more complicated procedure which is look in the 'tokyo_athletes' and if that country is in the dataset it means it participated so we'll add 1 if not leave it be.

that's for the "Apps" column now comes the "Avg_Medal", here we'll wait until we've summed up all the medals then divide the "Total" by "Apps" which will give us the average medals won

All i did above was taking every country from the 'tokyo_athletes' DF which will enable me to know who participated in the 2021 olympics, then i tried to convert every country name into a country code cuz this way i'll avoid having countries that are the same in both DFs but have different name combinations

Now each country's Apps is updated according to their existence in the tokyo_nocs i created

And finally the medals are upadated as well

Now i will work with either the 'athletes_events' dataset or the 'olympics_s' but not the 'tokyo_medals' one as i updated the medals and stuff into the last one


**- COUNTRIES**

**1. Medals Won**

1.1 Medals won by countries that don't have 0 Medals total or top 30

1.2 Medals won by arab countries

2. Most popular Discipline

3- Best Average Among:

**- GENDER VISUALIZATIONS:**

1. Medals Won over the Years

2. Events / Countries and Gender Participation

2.1 Countries & Gender

I tried using pandas for these kind of df builds but it's too slow so i'll just depend from now on on just building them using Numpy

2.2 Events & Gender

**- ATHLETES**

1- Most Successful Athletes

2- Most Apps

3- Age Distributions

**- Evolution:**

1- Physique (Height for few disciplines and gender)

2- Physique (Weight for few Disciplines and Gender)