## 2.3 Exploratory data analysis [Slides](https://www.slideshare.net/AlexeyGrigorev/ml-zoomcamp-2-slides) ## Notes **Pandas attributes and methods:** * `df[col].unique()` -> return a list of unique values in the series * `df[col].nunique()` -> return the number of unique values in the series * `df.isnull().sum()` -> return the number of null values in the dataframe **Matplotlib and seaborn methods:** * `%matplotlib inline` -> assure that plots are displayed in jupyter notebook's cells * `sns.histplot()` -> show the histogram of a series **Numpy methods:** * `np.log1p()` -> apply log transformation to a variable, after adding one to each input value. Long-tail distributions usually confuse the ML models, so the recommendation is to transform the target variable distribution to a normal one whenever possible. The entire code of this project is available in [this jupyter notebook](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/chapter-02-car-price/02-carprice.ipynb).
⚠️ The notes are written by the community.
If you see an error here, please create a PR with a fix.
* [Notes from Peter Ernicke](https://knowmledge.com/2023/09/19/ml-zoomcamp-2023-machine-learning-for-regression-part-2/) ## Navigation * [Machine Learning Zoomcamp course](../) * [Session 2: Machine Learning for Regression](./) * Previous: [Data preparation](02-data-preparation.md) * Next: [Setting up the validation framework](04-validation-framework.md)