{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[![Open in Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/justmarkham/scikit-learn-tips/master?filepath=notebooks%2F10_random_state.ipynb)\n",
"\n",
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/justmarkham/scikit-learn-tips/blob/master/notebooks/10_random_state.ipynb)\n",
"\n",
"# 🤖⚡ scikit-learn tip #10 ([video](https://www.youtube.com/watch?v=WAdrXVnOTIM&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=10))\n",
"\n",
"Q: Why set a value for \"random_state\"?\n",
"\n",
"A: Ensures that a \"random\" process will output the same results every time, which makes your code reproducible (by you and others!)\n",
"\n",
"See example 👇"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"df = pd.read_csv('http://bit.ly/kaggletrain', nrows=6)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"cols = ['Fare', 'Embarked', 'Sex']\n",
"X = df[cols]\n",
"y = df['Survived']"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Fare | \n",
" Embarked | \n",
" Sex | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 7.2500 | \n",
" S | \n",
" male | \n",
"
\n",
" \n",
" 1 | \n",
" 71.2833 | \n",
" C | \n",
" female | \n",
"
\n",
" \n",
" 2 | \n",
" 7.9250 | \n",
" S | \n",
" female | \n",
"
\n",
" \n",
" 3 | \n",
" 53.1000 | \n",
" S | \n",
" female | \n",
"
\n",
" \n",
" 4 | \n",
" 8.0500 | \n",
" S | \n",
" male | \n",
"
\n",
" \n",
" 5 | \n",
" 8.4583 | \n",
" Q | \n",
" male | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Fare Embarked Sex\n",
"0 7.2500 S male\n",
"1 71.2833 C female\n",
"2 7.9250 S female\n",
"3 53.1000 S female\n",
"4 8.0500 S male\n",
"5 8.4583 Q male"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Fare | \n",
" Embarked | \n",
" Sex | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 7.2500 | \n",
" S | \n",
" male | \n",
"
\n",
" \n",
" 3 | \n",
" 53.1000 | \n",
" S | \n",
" female | \n",
"
\n",
" \n",
" 5 | \n",
" 8.4583 | \n",
" Q | \n",
" male | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Fare Embarked Sex\n",
"0 7.2500 S male\n",
"3 53.1000 S female\n",
"5 8.4583 Q male"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# any positive integer can be used for the random_state value\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)\n",
"X_train"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Fare | \n",
" Embarked | \n",
" Sex | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 7.2500 | \n",
" S | \n",
" male | \n",
"
\n",
" \n",
" 3 | \n",
" 53.1000 | \n",
" S | \n",
" female | \n",
"
\n",
" \n",
" 5 | \n",
" 8.4583 | \n",
" Q | \n",
" male | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Fare Embarked Sex\n",
"0 7.2500 S male\n",
"3 53.1000 S female\n",
"5 8.4583 Q male"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# using the SAME random_state value results in the SAME random split\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)\n",
"X_train"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Fare | \n",
" Embarked | \n",
" Sex | \n",
"
\n",
" \n",
" \n",
" \n",
" 2 | \n",
" 7.9250 | \n",
" S | \n",
" female | \n",
"
\n",
" \n",
" 5 | \n",
" 8.4583 | \n",
" Q | \n",
" male | \n",
"
\n",
" \n",
" 0 | \n",
" 7.2500 | \n",
" S | \n",
" male | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Fare Embarked Sex\n",
"2 7.9250 S female\n",
"5 8.4583 Q male\n",
"0 7.2500 S male"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# using a DIFFERENT random_state value results in a DIFFERENT random split\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2)\n",
"X_train"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Want more tips? [View all tips on GitHub](https://github.com/justmarkham/scikit-learn-tips) or [Sign up to receive 2 tips by email every week](https://scikit-learn.tips) 💌\n",
"\n",
"© 2020 [Data School](https://www.dataschool.io). All rights reserved."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}